2024-08-09 12:33:18,291 INFO [train_multi_KD3.py:1187] (0/4) Training started 2024-08-09 12:33:18,308 INFO [train_multi_KD3.py:1197] (0/4) Device: cuda:0 2024-08-09 12:33:18,311 INFO [train_multi_KD3.py:1212] (0/4) Using dtype=torch.bfloat16 2024-08-09 12:33:18,311 INFO [train_multi_KD3.py:1214] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': 'a6c2f7a4-dirty', 'icefall-git-date': 'Thu Aug 8 16:21:21 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-09 12:33:18,311 INFO [train_multi_KD3.py:1216] (0/4) About to create model 2024-08-09 12:33:18,723 INFO [model_shift.py:142] (0/4) Delta_t: 6 when computing the distillation loss 2024-08-09 12:33:18,727 INFO [train_multi_KD3.py:1220] (0/4) Number of model parameters: 66484678 2024-08-09 12:33:20,724 INFO [train_multi_KD3.py:1235] (0/4) Using DDP 2024-08-09 12:33:22,044 INFO [kd_datamodule.py:690] (0/4) About to get train 960 cuts 2024-08-09 12:33:22,098 INFO [train_multi_KD3.py:1306] (0/4) Getting audioset cuts 2024-08-09 12:33:22,099 INFO [kd_datamodule.py:900] (0/4) About to get the audioset cuts for KD. 2024-08-09 12:33:22,100 INFO [kd_datamodule.py:869] (0/4) About to get the voxceleb cuts. 2024-08-09 12:33:22,103 INFO [kd_datamodule.py:880] (0/4) Adding voxceleb2 cuts. 2024-08-09 12:33:22,104 INFO [train_multi_KD3.py:1320] (0/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-09 12:33:30,946 INFO [train_multi_KD3.py:1322] (0/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-09 12:33:30,946 INFO [train_multi_KD3.py:1323] (0/4) Using weights: [1406195, 1904746, 1187704] 2024-08-09 12:33:30,946 INFO [train_multi_KD3.py:1332] (0/4) CutSet(len=4498645) [underlying data type: ] 2024-08-09 12:33:30,946 INFO [kd_datamodule.py:449] (0/4) Disable MUSAN 2024-08-09 12:33:30,946 INFO [kd_datamodule.py:489] (0/4) Disable SpecAugment 2024-08-09 12:33:30,946 INFO [kd_datamodule.py:491] (0/4) About to create train dataset 2024-08-09 12:33:30,947 INFO [kd_datamodule.py:528] (0/4) Using SimpleCutSampler 2024-08-09 12:33:30,948 INFO [kd_datamodule.py:536] (0/4) About to create train dataloader 2024-08-09 12:33:30,950 INFO [kd_datamodule.py:763] (0/4) About to get dev-clean cuts 2024-08-09 12:33:30,951 INFO [kd_datamodule.py:781] (0/4) About to get dev-other cuts 2024-08-09 12:33:30,953 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-09 12:33:31,234 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-09 12:33:31,234 INFO [kd_datamodule.py:840] (0/4) About to get the test set of voxceleb1 set. 2024-08-09 12:33:31,235 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-09 12:33:31,438 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-09 12:33:31,439 INFO [kd_datamodule.py:912] (0/4) About to get the audioset eval cuts. 2024-08-09 12:33:31,446 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-09 12:33:31,930 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-09 12:33:31,931 INFO [train_multi_KD3.py:1412] (0/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-09 12:33:47,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 0, loss[loss=1.246, beats_loss=0.9152, ecapa_loss=0.002358, whisper_loss=0.3077, over 20030.00 frames. ], tot_loss[loss=1.246, beats_loss=0.9152, ecapa_loss=0.002358, whisper_loss=0.3077, over 20030.00 frames. ], batch size: 87, lr: 2.25e-02, grad_scale: 2.0 2024-08-09 12:33:47,538 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 12:34:33,682 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on ASR_libri: loss=0.9193, beats_loss=0, ecapa_loss=0.006113, whisper_loss=0.8581, over 922467.00 frames. 2024-08-09 12:34:44,099 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1826, 2.7590, 3.1144, 3.1624], device='cuda:0') 2024-08-09 12:34:48,323 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on SV_voxceleb1: loss=0.05055, beats_loss=0, ecapa_loss=0.005055, whisper_loss=0, over 939242.00 frames. 2024-08-09 12:35:04,954 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4520, 3.4244, 3.4370, 3.4488], device='cuda:0') 2024-08-09 12:35:57,792 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5773, 4.5720, 4.6003, 4.5958], device='cuda:0') 2024-08-09 12:36:59,528 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on AT_audioset: loss=1.752, beats_loss=1.752, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 12:36:59,530 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 12:37:10,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.52 vs. limit=7.5 2024-08-09 12:37:13,089 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 12:37:15,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=0.0, ans=7.5 2024-08-09 12:37:18,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=0.0, ans=0.5 2024-08-09 12:37:44,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=26.55 vs. limit=7.5375 2024-08-09 12:38:06,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=475.89 vs. limit=7.575 2024-08-09 12:38:16,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=300.0, ans=0.4859375 2024-08-09 12:38:20,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=300.0, ans=0.20450000000000002 2024-08-09 12:38:28,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=191.49 vs. limit=4.06 2024-08-09 12:38:35,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=300.0, ans=0.098125 2024-08-09 12:38:35,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=344.42 vs. limit=7.6125 2024-08-09 12:38:45,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=400.0, ans=7.65 2024-08-09 12:38:45,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=25.24 vs. limit=7.65 2024-08-09 12:38:45,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=178.35 vs. limit=7.65 2024-08-09 12:39:04,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 50, loss[loss=0.2182, beats_loss=0.017, ecapa_loss=0.002424, whisper_loss=0.177, over 20849.00 frames. ], tot_loss[loss=0.3427, beats_loss=0.1331, ecapa_loss=0.001972, whisper_loss=0.1899, over 917086.69 frames. ], batch size: 89, lr: 2.48e-02, grad_scale: 2.0 2024-08-09 12:39:07,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500.0, ans=0.295 2024-08-09 12:39:10,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=500.0, ans=5.125 2024-08-09 12:39:11,697 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 12:39:25,501 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 12:39:33,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=394.13 vs. limit=7.725 2024-08-09 12:39:51,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700.0, ans=0.293 2024-08-09 12:39:53,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=700.0, ans=0.4671875 2024-08-09 12:39:55,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=336.96 vs. limit=7.7625 2024-08-09 12:40:07,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=700.0, ans=0.17375000000000002 2024-08-09 12:40:13,954 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 12:40:14,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=328.77 vs. limit=7.8 2024-08-09 12:40:20,081 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 12:40:25,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=800.0, ans=0.872 2024-08-09 12:40:33,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=900.0, ans=0.4578125 2024-08-09 12:40:46,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=277.75 vs. limit=8.175 2024-08-09 12:40:47,887 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 12:40:51,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.930e+01 4.445e+01 8.118e+01 2.890e+03, threshold=8.890e+01, percent-clipped=0.0 2024-08-09 12:40:51,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 100, loss[loss=0.2159, beats_loss=0.02171, ecapa_loss=0.001711, whisper_loss=0.177, over 20227.00 frames. ], tot_loss[loss=0.27, beats_loss=0.07232, ecapa_loss=0.001913, whisper_loss=0.1786, over 1568858.44 frames. ], batch size: 77, lr: 2.70e-02, grad_scale: 4.0 2024-08-09 12:40:58,946 WARNING [optim.py:496] (0/4) Scaling gradients by 0.048358626663684845, model_norm_threshold=88.8975601196289 2024-08-09 12:40:59,106 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.2.norm.log_scale with proportion 0.88, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.987e+06, grad_sumsq=2.987e+06, orig_rms_sq=1.000e+00 2024-08-09 12:41:03,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=17.79 vs. limit=5.25 2024-08-09 12:41:06,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=309.38 vs. limit=7.875 2024-08-09 12:41:08,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=41.43 vs. limit=7.9125 2024-08-09 12:41:08,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=434.47 vs. limit=7.9125 2024-08-09 12:41:13,940 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 12:41:14,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1100.0, ans=0.289 2024-08-09 12:41:14,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=139.74 vs. limit=8.325 2024-08-09 12:41:21,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1100.0, ans=0.15875 2024-08-09 12:41:25,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1100.0, ans=0.0275 2024-08-09 12:41:33,111 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 12:41:33,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=368.75 vs. limit=8.4 2024-08-09 12:41:34,601 WARNING [optim.py:496] (0/4) Scaling gradients by 0.011974900029599667, model_norm_threshold=88.8975601196289 2024-08-09 12:41:34,767 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.96, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.313e+07, grad_sumsq=5.313e+07, orig_rms_sq=1.000e+00 2024-08-09 12:41:36,566 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 12:41:40,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=185.42 vs. limit=7.95 2024-08-09 12:41:45,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=254.87 vs. limit=8.475 2024-08-09 12:41:45,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=4.52 2024-08-09 12:41:49,444 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 27 from LS+wenet, 11 from Vox, 17 fro AS 2024-08-09 12:41:50,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=265.74 vs. limit=7.9875 2024-08-09 12:41:51,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1300.0, ans=0.5 2024-08-09 12:41:52,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=37.21 vs. limit=5.0 2024-08-09 12:42:02,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=17.22 vs. limit=5.35 2024-08-09 12:42:05,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=142.18 vs. limit=5.7 2024-08-09 12:42:09,870 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 12:42:15,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 150, loss[loss=0.1957, beats_loss=0.02065, ecapa_loss=0.001529, whisper_loss=0.1598, over 18057.00 frames. ], tot_loss[loss=0.2432, beats_loss=0.052, ecapa_loss=0.001875, whisper_loss=0.1724, over 2023758.95 frames. ], batch size: 67, lr: 2.93e-02, grad_scale: 4.0 2024-08-09 12:42:19,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1500.0, ans=0.5 2024-08-09 12:42:28,155 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04562794789671898, model_norm_threshold=88.8975601196289 2024-08-09 12:42:28,326 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.64, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.426e+06, grad_sumsq=2.426e+06, orig_rms_sq=1.000e+00 2024-08-09 12:42:29,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=44.14 vs. limit=5.375 2024-08-09 12:42:33,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=4.64 2024-08-09 12:42:45,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=72.80 vs. limit=5.8 2024-08-09 12:42:50,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1700.0, ans=0.4203125 2024-08-09 12:42:54,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1700.0, ans=0.4203125 2024-08-09 12:43:02,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=4.72 2024-08-09 12:43:08,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1800.0, ans=0.415625 2024-08-09 12:43:28,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=328.40 vs. limit=8.925 2024-08-09 12:43:30,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1900.0, ans=0.4109375 2024-08-09 12:43:33,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=304.40 vs. limit=9.0 2024-08-09 12:43:34,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+01 2.774e+01 3.640e+01 5.016e+01 7.424e+03, threshold=7.280e+01, percent-clipped=13.0 2024-08-09 12:43:34,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 200, loss[loss=0.1923, beats_loss=0.01777, ecapa_loss=0.001755, whisper_loss=0.157, over 14865.00 frames. ], tot_loss[loss=0.2297, beats_loss=0.04055, ecapa_loss=0.001861, whisper_loss=0.1706, over 2438765.35 frames. ], batch size: 55, lr: 3.15e-02, grad_scale: 8.0 2024-08-09 12:43:36,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=27.96 vs. limit=5.5 2024-08-09 12:43:38,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2000.0, ans=5.5 2024-08-09 12:43:44,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=86.75 vs. limit=8.25 2024-08-09 12:43:47,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=124.25 vs. limit=9.0 2024-08-09 12:43:52,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2100.0, ans=0.4015625 2024-08-09 12:43:54,409 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06407187134027481, model_norm_threshold=72.79639434814453 2024-08-09 12:43:54,589 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.47, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.083e+05, grad_sumsq=6.083e+05, orig_rms_sq=1.000e+00 2024-08-09 12:44:03,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=75.40 vs. limit=8.2875 2024-08-09 12:44:18,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=55.91 vs. limit=8.325 2024-08-09 12:44:18,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=90.03 vs. limit=8.325 2024-08-09 12:44:20,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=8.3625 2024-08-09 12:44:22,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=305.52 vs. limit=9.225 2024-08-09 12:44:23,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2300.0, ans=6.4375 2024-08-09 12:44:29,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2300.0, ans=0.11374999999999999 2024-08-09 12:44:33,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.24 vs. limit=9.225 2024-08-09 12:44:36,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=183.47 vs. limit=8.4 2024-08-09 12:44:38,693 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 31 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-09 12:44:42,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=15.04 vs. limit=5.6 2024-08-09 12:44:43,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=293.45 vs. limit=8.4 2024-08-09 12:44:46,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2400.0, ans=0.046 2024-08-09 12:44:46,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2400.0, ans=0.3875 2024-08-09 12:44:48,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=249.47 vs. limit=8.4 2024-08-09 12:44:51,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=180.74 vs. limit=9.375 2024-08-09 12:44:52,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 250, loss[loss=0.2254, beats_loss=0.01506, ecapa_loss=0.001748, whisper_loss=0.1929, over 23207.00 frames. ], tot_loss[loss=0.2213, beats_loss=0.03376, ecapa_loss=0.001835, whisper_loss=0.1692, over 2761383.73 frames. ], batch size: 89, lr: 3.38e-02, grad_scale: 8.0 2024-08-09 12:45:00,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2500.0, ans=0.3828125 2024-08-09 12:45:02,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=57.43 vs. limit=8.4375 2024-08-09 12:45:10,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=141.84 vs. limit=8.475 2024-08-09 12:45:12,744 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 12:45:12,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2600.0, ans=0.378125 2024-08-09 12:45:19,549 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 12:45:36,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=83.83 vs. limit=8.5125 2024-08-09 12:45:39,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2800.0, ans=0.095 2024-08-09 12:45:40,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=23.46 vs. limit=6.4 2024-08-09 12:45:42,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=43.62 vs. limit=8.55 2024-08-09 12:45:46,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2800.0, ans=0.037000000000000005 2024-08-09 12:45:49,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=53.80 vs. limit=9.6 2024-08-09 12:45:51,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2800.0, ans=0.36875 2024-08-09 12:45:54,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2900.0, ans=0.081875 2024-08-09 12:46:04,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=13.76 vs. limit=5.725 2024-08-09 12:46:05,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2900.0, ans=0.3640625 2024-08-09 12:46:10,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.536e+01 4.623e+01 6.113e+01 1.136e+03, threshold=9.245e+01, percent-clipped=13.0 2024-08-09 12:46:10,786 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 300, loss[loss=0.1883, beats_loss=0.01742, ecapa_loss=0.00187, whisper_loss=0.1522, over 16829.00 frames. ], tot_loss[loss=0.214, beats_loss=0.02933, ecapa_loss=0.001803, whisper_loss=0.1666, over 3010206.72 frames. ], batch size: 67, lr: 3.60e-02, grad_scale: 8.0 2024-08-09 12:46:12,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3000.0, ans=0.040625 2024-08-09 12:46:14,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3000.0, ans=0.245 2024-08-09 12:46:15,577 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 36 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 12:46:18,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=9.75 2024-08-09 12:46:28,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3100.0, ans=0.11249999999999999 2024-08-09 12:46:31,456 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 12:46:36,223 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-09 12:46:36,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=110.32 vs. limit=9.825 2024-08-09 12:46:38,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=130.89 vs. limit=8.6625 2024-08-09 12:46:41,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=24.37 vs. limit=8.7 2024-08-09 12:46:42,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=5.28 2024-08-09 12:46:49,521 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 11 from Vox, 44 fro AS 2024-08-09 12:46:52,912 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-09 12:46:58,032 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 12:47:00,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=31.09 vs. limit=6.65 2024-08-09 12:47:02,630 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 12:47:05,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=31.18 vs. limit=8.7375 2024-08-09 12:47:07,348 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-09 12:47:15,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=33.62 vs. limit=8.775 2024-08-09 12:47:16,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3400.0, ans=0.340625 2024-08-09 12:47:20,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3400.0, ans=0.340625 2024-08-09 12:47:28,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 350, loss[loss=0.2064, beats_loss=0.018, ecapa_loss=0.001653, whisper_loss=0.1718, over 22225.00 frames. ], tot_loss[loss=0.207, beats_loss=0.02631, ecapa_loss=0.001772, whisper_loss=0.163, over 3170597.18 frames. ], batch size: 89, lr: 3.83e-02, grad_scale: 8.0 2024-08-09 12:47:38,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=5.875 2024-08-09 12:47:39,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=10.125 2024-08-09 12:47:42,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=60.34 vs. limit=8.8125 2024-08-09 12:47:43,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=8.85 2024-08-09 12:47:45,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=82.64 vs. limit=8.85 2024-08-09 12:47:48,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600.0, ans=0.264 2024-08-09 12:47:54,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=5.4399999999999995 2024-08-09 12:47:54,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=8.85 2024-08-09 12:47:55,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=10.2 2024-08-09 12:48:03,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700.0, ans=0.263 2024-08-09 12:48:03,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3700.0, ans=0.3265625 2024-08-09 12:48:03,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=68.68 vs. limit=8.8875 2024-08-09 12:48:04,407 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 12:48:11,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.57 vs. limit=6.85 2024-08-09 12:48:16,881 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-09 12:48:17,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=12.02 vs. limit=5.95 2024-08-09 12:48:43,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=59.64 vs. limit=10.425 2024-08-09 12:48:46,238 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.862e+01 3.339e+01 4.177e+01 8.866e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-09 12:48:46,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 400, loss[loss=0.1928, beats_loss=0.01545, ecapa_loss=0.001504, whisper_loss=0.1623, over 14805.00 frames. ], tot_loss[loss=0.2009, beats_loss=0.02407, ecapa_loss=0.001743, whisper_loss=0.1594, over 3282460.25 frames. ], batch size: 54, lr: 4.05e-02, grad_scale: 16.0 2024-08-09 12:48:49,695 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 12:48:49,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4000.0, ans=0.07500000000000001 2024-08-09 12:49:01,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.55 vs. limit=7.05 2024-08-09 12:49:09,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4100.0, ans=0.04958333333333333 2024-08-09 12:49:09,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4100.0, ans=0.3078125 2024-08-09 12:49:22,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=9.075 2024-08-09 12:49:24,484 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 12:49:30,406 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 12:49:32,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.77 vs. limit=9.1125 2024-08-09 12:49:36,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.69 vs. limit=5.0 2024-08-09 12:49:37,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4300.0, ans=0.2984375 2024-08-09 12:49:40,036 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 12:49:43,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=43.65 vs. limit=9.1125 2024-08-09 12:49:45,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=5.72 2024-08-09 12:49:51,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.99 vs. limit=9.15 2024-08-09 12:49:54,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4400.0, ans=0.00991304347826087 2024-08-09 12:49:56,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.11 vs. limit=5.0 2024-08-09 12:50:00,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4400.0, ans=0.256 2024-08-09 12:50:01,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=24.34 vs. limit=9.15 2024-08-09 12:50:03,143 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 450, loss[loss=0.1939, beats_loss=0.01723, ecapa_loss=0.001433, whisper_loss=0.1623, over 24380.00 frames. ], tot_loss[loss=0.1974, beats_loss=0.02204, ecapa_loss=0.001711, whisper_loss=0.1582, over 3421142.65 frames. ], batch size: 91, lr: 4.28e-02, grad_scale: 16.0 2024-08-09 12:50:05,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=37.94 vs. limit=9.1875 2024-08-09 12:50:06,468 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 12:50:06,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4500.0, ans=0.2890625 2024-08-09 12:50:14,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=23.43 vs. limit=9.1875 2024-08-09 12:50:16,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4500.0, ans=9.1875 2024-08-09 12:50:18,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4600.0, ans=0.254 2024-08-09 12:50:23,209 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-09 12:50:26,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4600.0, ans=0.0475 2024-08-09 12:50:39,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=27.03 vs. limit=9.2625 2024-08-09 12:50:40,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4700.0, ans=0.04708333333333334 2024-08-09 12:50:44,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4700.0, ans=0.253 2024-08-09 12:50:46,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=81.65 vs. limit=9.2625 2024-08-09 12:50:55,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4800.0, ans=9.3 2024-08-09 12:50:57,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=23.43 vs. limit=9.3 2024-08-09 12:50:59,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4800.0, ans=0.275 2024-08-09 12:51:06,911 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 12:51:15,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=9.3375 2024-08-09 12:51:17,707 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 12:51:18,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.539e+01 2.551e+01 3.113e+01 4.254e+01 7.113e+01, threshold=6.225e+01, percent-clipped=1.0 2024-08-09 12:51:18,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 500, loss[loss=0.2176, beats_loss=0.01523, ecapa_loss=0.001574, whisper_loss=0.1866, over 23997.00 frames. ], tot_loss[loss=0.1937, beats_loss=0.02056, ecapa_loss=0.001674, whisper_loss=0.1564, over 3506511.35 frames. ], batch size: 91, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:51:26,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=6.0 2024-08-09 12:51:31,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=35.11 vs. limit=9.375 2024-08-09 12:51:54,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=45.11 vs. limit=11.4 2024-08-09 12:51:56,990 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-09 12:51:58,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5200.0, ans=0.25625 2024-08-09 12:52:04,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5300.0, ans=0.044583333333333336 2024-08-09 12:52:10,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5300.0, ans=0.2515625 2024-08-09 12:52:14,931 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-09 12:52:17,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=9.4875 2024-08-09 12:52:20,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.23 vs. limit=7.7 2024-08-09 12:52:22,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=5.08 2024-08-09 12:52:28,200 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 12:52:31,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5400.0, ans=0.246875 2024-08-09 12:52:35,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 550, loss[loss=0.1883, beats_loss=0.01455, ecapa_loss=0.001458, whisper_loss=0.1592, over 15325.00 frames. ], tot_loss[loss=0.1899, beats_loss=0.01937, ecapa_loss=0.001641, whisper_loss=0.1542, over 3577548.53 frames. ], batch size: 59, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:52:45,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=6.375 2024-08-09 12:52:46,460 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 12:52:57,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=6.4 2024-08-09 12:52:59,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=41.45 vs. limit=9.6 2024-08-09 12:53:03,698 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 12:53:10,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5700.0, ans=0.7005 2024-08-09 12:53:14,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5700.0, ans=0.23281249999999998 2024-08-09 12:53:17,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5700.0, ans=0.009630434782608695 2024-08-09 12:53:18,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=9.6375 2024-08-09 12:53:20,915 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.843e+01 2024-08-09 12:53:26,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5800.0, ans=0.22812500000000002 2024-08-09 12:53:28,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=32.75 vs. limit=9.675 2024-08-09 12:53:38,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=32.44 vs. limit=11.925 2024-08-09 12:53:46,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5900.0, ans=0.2234375 2024-08-09 12:53:49,050 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 12:53:51,508 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 12:53:52,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+01 2.262e+01 2.880e+01 3.640e+01 5.434e+01, threshold=5.761e+01, percent-clipped=0.0 2024-08-09 12:53:52,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 600, loss[loss=0.1952, beats_loss=0.01088, ecapa_loss=0.001666, whisper_loss=0.1677, over 16687.00 frames. ], tot_loss[loss=0.1868, beats_loss=0.01848, ecapa_loss=0.001599, whisper_loss=0.1523, over 3624558.99 frames. ], batch size: 64, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:54:10,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=29.26 vs. limit=12.075 2024-08-09 12:54:11,736 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 12:54:12,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=6100.0, ans=0.2140625 2024-08-09 12:54:21,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.26 vs. limit=8.05 2024-08-09 12:54:26,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=6200.0, ans=0.20937499999999998 2024-08-09 12:54:33,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=9.825 2024-08-09 12:54:46,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=9.8625 2024-08-09 12:54:50,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=6.52 2024-08-09 12:55:00,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=9.9 2024-08-09 12:55:04,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.39 vs. limit=12.3 2024-08-09 12:55:10,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 650, loss[loss=0.1317, beats_loss=0.01713, ecapa_loss=0.001636, whisper_loss=0.09823, over 17752.00 frames. ], tot_loss[loss=0.1837, beats_loss=0.01782, ecapa_loss=0.001561, whisper_loss=0.1503, over 3661002.78 frames. ], batch size: 75, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:55:32,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=12.45 2024-08-09 12:55:35,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=9.975 2024-08-09 12:55:49,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6700.0, ans=0.23299999999999998 2024-08-09 12:55:52,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=58.28 vs. limit=10.0125 2024-08-09 12:56:03,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=10.05 2024-08-09 12:56:05,734 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 9 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 12:56:11,642 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 12:56:13,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.49 vs. limit=12.675 2024-08-09 12:56:16,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=10.0875 2024-08-09 12:56:18,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.38 vs. limit=8.45 2024-08-09 12:56:18,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=10.0875 2024-08-09 12:56:19,293 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 12:56:24,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.370e+01 2.323e+01 2.699e+01 3.837e+01 7.112e+01, threshold=5.398e+01, percent-clipped=6.0 2024-08-09 12:56:25,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 700, loss[loss=0.1834, beats_loss=0.01613, ecapa_loss=0.001137, whisper_loss=0.1558, over 20548.00 frames. ], tot_loss[loss=0.1796, beats_loss=0.01734, ecapa_loss=0.001511, whisper_loss=0.1471, over 3678453.21 frames. ], batch size: 80, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:56:29,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=7000.0, ans=0.037500000000000006 2024-08-09 12:56:38,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=7100.0, ans=0.1671875 2024-08-09 12:56:49,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7100.0, ans=0.22899999999999998 2024-08-09 12:57:04,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7200.0, ans=0.22799999999999998 2024-08-09 12:57:10,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=12.975 2024-08-09 12:57:11,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=10.2375 2024-08-09 12:57:25,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=7400.0, ans=0.153125 2024-08-09 12:57:29,912 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-09 12:57:40,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 750, loss[loss=0.1614, beats_loss=0.01479, ecapa_loss=0.001281, whisper_loss=0.1338, over 21371.00 frames. ], tot_loss[loss=0.1757, beats_loss=0.01691, ecapa_loss=0.001469, whisper_loss=0.1441, over 3692205.64 frames. ], batch size: 82, lr: 4.49e-02, grad_scale: 16.0 2024-08-09 12:57:45,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=10.3125 2024-08-09 12:57:57,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=7600.0, ans=0.035 2024-08-09 12:58:04,500 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 12:58:15,455 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 12:58:16,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=28.51 vs. limit=10.3875 2024-08-09 12:58:19,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=10.3875 2024-08-09 12:58:44,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.90 vs. limit=13.425 2024-08-09 12:58:48,037 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 12:58:49,252 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 12:58:53,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=10.4625 2024-08-09 12:58:53,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=33.10 vs. limit=10.4625 2024-08-09 12:58:57,139 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.305e+01 2.802e+01 3.610e+01 6.792e+01, threshold=5.604e+01, percent-clipped=3.0 2024-08-09 12:58:57,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 800, loss[loss=0.1699, beats_loss=0.01572, ecapa_loss=0.001417, whisper_loss=0.14, over 21907.00 frames. ], tot_loss[loss=0.1735, beats_loss=0.0165, ecapa_loss=0.001416, whisper_loss=0.1428, over 3766385.81 frames. ], batch size: 92, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 12:59:02,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=8000.0, ans=0.125 2024-08-09 12:59:06,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=8000.0, ans=0.126 2024-08-09 12:59:07,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=10.5 2024-08-09 12:59:16,636 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 12:59:17,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.5375 2024-08-09 12:59:18,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=8100.0, ans=0.125 2024-08-09 12:59:31,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=25.97 vs. limit=10.575 2024-08-09 12:59:37,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=10.575 2024-08-09 12:59:50,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=13.725 2024-08-09 12:59:58,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=8400.0, ans=0.125 2024-08-09 13:00:13,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 850, loss[loss=0.1639, beats_loss=0.013, ecapa_loss=0.001357, whisper_loss=0.1373, over 20185.00 frames. ], tot_loss[loss=0.1701, beats_loss=0.0161, ecapa_loss=0.001373, whisper_loss=0.1403, over 3768544.08 frames. ], batch size: 83, lr: 4.49e-02, grad_scale: 32.0 2024-08-09 13:00:13,691 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 13:00:16,459 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 13:00:19,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=8500.0, ans=0.03125 2024-08-09 13:00:22,616 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-09 13:00:28,242 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 13:00:43,086 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 13:00:50,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=8700.0, ans=0.125 2024-08-09 13:00:50,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.86 vs. limit=14.025 2024-08-09 13:00:54,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=8700.0, ans=0.008978260869565217 2024-08-09 13:01:06,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=8800.0, ans=0.5920000000000001 2024-08-09 13:01:09,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.73 vs. limit=9.4 2024-08-09 13:01:24,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=8900.0, ans=0.05 2024-08-09 13:01:26,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.500e+01 2.129e+01 2.561e+01 3.167e+01 6.018e+01, threshold=5.121e+01, percent-clipped=3.0 2024-08-09 13:01:26,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 900, loss[loss=0.1362, beats_loss=0.01638, ecapa_loss=0.0011, whisper_loss=0.1089, over 15254.00 frames. ], tot_loss[loss=0.1669, beats_loss=0.01595, ecapa_loss=0.001314, whisper_loss=0.1378, over 3807613.00 frames. ], batch size: 58, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:01:39,435 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 12 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 13:01:39,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=9100.0, ans=0.125 2024-08-09 13:01:41,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=10.9125 2024-08-09 13:01:52,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9100.0, ans=0.20900000000000002 2024-08-09 13:01:54,802 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-09 13:02:22,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9400.0, ans=0.20600000000000002 2024-08-09 13:02:28,115 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 13:02:36,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=9500.0, ans=0.008804347826086956 2024-08-09 13:02:37,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 950, loss[loss=0.1169, beats_loss=0.01587, ecapa_loss=0.0008825, whisper_loss=0.09225, over 16136.00 frames. ], tot_loss[loss=0.1649, beats_loss=0.01575, ecapa_loss=0.001268, whisper_loss=0.1364, over 3805563.61 frames. ], batch size: 62, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:02:56,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.40 vs. limit=11.1 2024-08-09 13:03:04,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=9600.0, ans=0.125 2024-08-09 13:03:37,431 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 13:03:48,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=11.2125 2024-08-09 13:03:50,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.609e+01 2.154e+01 2.525e+01 3.011e+01 6.635e+01, threshold=5.049e+01, percent-clipped=1.0 2024-08-09 13:03:50,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1000, loss[loss=0.1841, beats_loss=0.0117, ecapa_loss=0.001097, whisper_loss=0.1615, over 18826.00 frames. ], tot_loss[loss=0.1626, beats_loss=0.01563, ecapa_loss=0.001219, whisper_loss=0.1348, over 3829908.94 frames. ], batch size: 72, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:03:53,317 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 13:03:57,862 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.681e+01 2024-08-09 13:04:04,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.98 vs. limit=11.2875 2024-08-09 13:04:10,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=11.2875 2024-08-09 13:04:13,979 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 13:04:27,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=4.53 2024-08-09 13:04:44,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=8.120000000000001 2024-08-09 13:04:48,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.05 vs. limit=10.2 2024-08-09 13:04:54,166 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 13:04:56,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10400.0, ans=0.125 2024-08-09 13:05:04,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1050, loss[loss=0.1524, beats_loss=0.01493, ecapa_loss=0.001014, whisper_loss=0.1273, over 15585.00 frames. ], tot_loss[loss=0.1599, beats_loss=0.01552, ecapa_loss=0.001176, whisper_loss=0.1327, over 3815282.49 frames. ], batch size: 60, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:05:04,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=10500.0, ans=0.125 2024-08-09 13:05:15,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=11.4375 2024-08-09 13:05:19,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=10600.0, ans=0.07 2024-08-09 13:05:57,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=11.55 2024-08-09 13:06:04,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=10900.0, ans=0.125 2024-08-09 13:06:06,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=15.675 2024-08-09 13:06:15,027 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 36 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 13:06:15,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=11.5875 2024-08-09 13:06:18,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+01 2.283e+01 2.878e+01 3.739e+01 7.694e+01, threshold=5.756e+01, percent-clipped=7.0 2024-08-09 13:06:18,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1100, loss[loss=0.1218, beats_loss=0.01877, ecapa_loss=0.000979, whisper_loss=0.09329, over 20497.00 frames. ], tot_loss[loss=0.1588, beats_loss=0.01537, ecapa_loss=0.001138, whisper_loss=0.1321, over 3827913.04 frames. ], batch size: 88, lr: 4.48e-02, grad_scale: 32.0 2024-08-09 13:06:19,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=11.625 2024-08-09 13:06:23,520 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-09 13:06:29,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=11.625 2024-08-09 13:06:40,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.51 vs. limit=15.825 2024-08-09 13:06:46,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.55 vs. limit=10.55 2024-08-09 13:06:50,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=11200.0, ans=0.125 2024-08-09 13:07:04,871 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 13:07:17,631 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 13:07:21,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=11400.0, ans=0.5010000000000001 2024-08-09 13:07:23,588 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-09 13:07:25,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=11400.0, ans=0.09899494936611666 2024-08-09 13:07:31,817 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1150, loss[loss=0.1406, beats_loss=0.01414, ecapa_loss=0.001008, whisper_loss=0.1164, over 16719.00 frames. ], tot_loss[loss=0.1574, beats_loss=0.01519, ecapa_loss=0.001102, whisper_loss=0.1312, over 3810433.70 frames. ], batch size: 66, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:07:42,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=11500.0, ans=0.05 2024-08-09 13:07:45,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=11600.0, ans=0.125 2024-08-09 13:07:58,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11600.0, ans=0.125 2024-08-09 13:07:59,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=11.85 2024-08-09 13:08:02,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=11700.0, ans=0.2 2024-08-09 13:08:22,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11800.0, ans=0.182 2024-08-09 13:08:32,800 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 13:08:38,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=11900.0, ans=0.09899494936611666 2024-08-09 13:08:41,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=11.9625 2024-08-09 13:08:45,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.329e+01 2.685e+01 3.204e+01 5.571e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-09 13:08:45,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1200, loss[loss=0.1429, beats_loss=0.01244, ecapa_loss=0.000906, whisper_loss=0.1214, over 16893.00 frames. ], tot_loss[loss=0.1555, beats_loss=0.01504, ecapa_loss=0.001068, whisper_loss=0.1298, over 3798326.66 frames. ], batch size: 63, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:09:01,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12100.0, ans=0.125 2024-08-09 13:09:10,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.51 vs. limit=11.05 2024-08-09 13:09:16,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12200.0, ans=0.125 2024-08-09 13:09:28,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=16.725 2024-08-09 13:09:32,340 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-09 13:09:38,320 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 13:09:44,999 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 13:09:49,194 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-09 13:09:51,923 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 37 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 13:09:58,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1250, loss[loss=0.1467, beats_loss=0.01437, ecapa_loss=0.0008413, whisper_loss=0.1239, over 22290.00 frames. ], tot_loss[loss=0.1529, beats_loss=0.01499, ecapa_loss=0.00103, whisper_loss=0.1276, over 3825611.56 frames. ], batch size: 86, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:10:00,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=8.125 2024-08-09 13:10:06,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.16 vs. limit=16.875 2024-08-09 13:10:37,294 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.537e+00 2024-08-09 13:10:47,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=12800.0, ans=0.125 2024-08-09 13:10:52,008 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 13:10:58,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=12.3375 2024-08-09 13:11:02,735 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-09 13:11:12,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.459e+01 3.175e+01 4.087e+01 8.300e+01, threshold=6.351e+01, percent-clipped=6.0 2024-08-09 13:11:13,021 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1300, loss[loss=0.1338, beats_loss=0.01536, ecapa_loss=0.000818, whisper_loss=0.1103, over 21870.00 frames. ], tot_loss[loss=0.1514, beats_loss=0.01479, ecapa_loss=0.001003, whisper_loss=0.1266, over 3813650.21 frames. ], batch size: 89, lr: 4.47e-02, grad_scale: 32.0 2024-08-09 13:11:19,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=13000.0, ans=0.012500000000000004 2024-08-09 13:11:19,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.375 2024-08-09 13:11:30,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.35 vs. limit=17.325 2024-08-09 13:11:35,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=13100.0, ans=0.125 2024-08-09 13:11:36,501 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 13:11:37,883 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-09 13:11:43,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=13200.0, ans=0.438 2024-08-09 13:12:04,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=12.4875 2024-08-09 13:12:09,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=13300.0, ans=0.125 2024-08-09 13:12:21,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13400.0, ans=0.16599999999999998 2024-08-09 13:12:21,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=12.525 2024-08-09 13:12:27,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1350, loss[loss=0.1721, beats_loss=0.01279, ecapa_loss=0.0008954, whisper_loss=0.1504, over 17518.00 frames. ], tot_loss[loss=0.15, beats_loss=0.01458, ecapa_loss=0.0009748, whisper_loss=0.1257, over 3798461.03 frames. ], batch size: 68, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:12:41,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=13600.0, ans=0.42400000000000004 2024-08-09 13:12:42,268 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=1.007e-02 2024-08-09 13:12:51,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=13600.0, ans=0.125 2024-08-09 13:12:52,649 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 13:13:06,032 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 13:13:22,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.675 2024-08-09 13:13:23,574 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 13:13:26,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=13900.0, ans=0.41350000000000003 2024-08-09 13:13:29,272 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 13:13:40,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.453e+01 2.894e+01 3.668e+01 7.407e+01, threshold=5.787e+01, percent-clipped=1.0 2024-08-09 13:13:41,009 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1400, loss[loss=0.1317, beats_loss=0.01319, ecapa_loss=0.0008821, whisper_loss=0.1097, over 14536.00 frames. ], tot_loss[loss=0.1492, beats_loss=0.01437, ecapa_loss=0.0009501, whisper_loss=0.1254, over 3786335.74 frames. ], batch size: 59, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:13:41,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=14000.0, ans=0.00782608695652174 2024-08-09 13:13:46,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=14000.0, ans=0.008333333333333338 2024-08-09 13:14:01,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=14100.0, ans=18.075 2024-08-09 13:14:08,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=14100.0, ans=0.007804347826086957 2024-08-09 13:14:12,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=14200.0, ans=0.158 2024-08-09 13:14:19,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=14200.0, ans=0.125 2024-08-09 13:14:22,244 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 13:14:44,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=14400.0, ans=0.006666666666666668 2024-08-09 13:14:56,368 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1450, loss[loss=0.1283, beats_loss=0.01502, ecapa_loss=0.0007609, whisper_loss=0.1057, over 16612.00 frames. ], tot_loss[loss=0.1466, beats_loss=0.01435, ecapa_loss=0.0009241, whisper_loss=0.123, over 3763000.01 frames. ], batch size: 64, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:15:28,622 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 13:15:33,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=14600.0, ans=0.005833333333333336 2024-08-09 13:16:13,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=14800.0, ans=0.125 2024-08-09 13:16:31,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+01 2.402e+01 3.110e+01 4.073e+01 8.821e+01, threshold=6.219e+01, percent-clipped=9.0 2024-08-09 13:16:31,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1500, loss[loss=0.1398, beats_loss=0.01607, ecapa_loss=0.0006624, whisper_loss=0.1171, over 19768.00 frames. ], tot_loss[loss=0.1443, beats_loss=0.01448, ecapa_loss=0.0008954, whisper_loss=0.1209, over 3773456.38 frames. ], batch size: 76, lr: 4.46e-02, grad_scale: 32.0 2024-08-09 13:16:41,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=15000.0, ans=0.125 2024-08-09 13:16:48,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=15100.0, ans=0.007586956521739131 2024-08-09 13:16:49,765 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:16:59,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=15100.0, ans=0.125 2024-08-09 13:17:07,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=12.6 2024-08-09 13:17:12,558 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 13:17:21,953 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 13:17:26,750 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 13:17:32,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=15300.0, ans=0.0029166666666666716 2024-08-09 13:17:52,912 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1550, loss[loss=0.1534, beats_loss=0.0135, ecapa_loss=0.0008695, whisper_loss=0.1312, over 22559.00 frames. ], tot_loss[loss=0.1432, beats_loss=0.01436, ecapa_loss=0.0008733, whisper_loss=0.1201, over 3767064.52 frames. ], batch size: 90, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:17:55,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=15500.0, ans=0.125 2024-08-09 13:18:06,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=13.3125 2024-08-09 13:18:16,850 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 13:18:23,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=13.3875 2024-08-09 13:18:35,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=13.3875 2024-08-09 13:19:00,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=8.975 2024-08-09 13:19:03,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=15900.0, ans=0.05 2024-08-09 13:19:12,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.449e+01 2.841e+01 3.798e+01 6.790e+01, threshold=5.683e+01, percent-clipped=3.0 2024-08-09 13:19:12,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1600, loss[loss=0.1738, beats_loss=0.01318, ecapa_loss=0.0007256, whisper_loss=0.1533, over 23466.00 frames. ], tot_loss[loss=0.1419, beats_loss=0.01438, ecapa_loss=0.000848, whisper_loss=0.1191, over 3778954.83 frames. ], batch size: 90, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:19:18,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.58 vs. limit=19.5 2024-08-09 13:19:24,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=16000.0, ans=0.0 2024-08-09 13:19:28,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.27 vs. limit=13.05 2024-08-09 13:19:31,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=16100.0, ans=0.125 2024-08-09 13:19:39,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=16100.0, ans=0.139 2024-08-09 13:19:50,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=16200.0, ans=0.0 2024-08-09 13:20:08,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=16300.0, ans=0.3295 2024-08-09 13:20:16,901 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 13:20:25,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=16400.0, ans=0.125 2024-08-09 13:20:32,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.96 vs. limit=13.25 2024-08-09 13:20:33,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1650, loss[loss=0.1674, beats_loss=0.01388, ecapa_loss=0.0007575, whisper_loss=0.1459, over 23070.00 frames. ], tot_loss[loss=0.1411, beats_loss=0.0144, ecapa_loss=0.0008254, whisper_loss=0.1184, over 3789716.85 frames. ], batch size: 90, lr: 4.45e-02, grad_scale: 32.0 2024-08-09 13:20:40,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=16500.0, ans=0.125 2024-08-09 13:20:46,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.40 vs. limit=19.875 2024-08-09 13:20:54,386 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.877e-02 2024-08-09 13:21:00,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=16600.0, ans=0.449 2024-08-09 13:21:03,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=16600.0, ans=0.125 2024-08-09 13:21:04,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.99 vs. limit=9.15 2024-08-09 13:21:12,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=13.7625 2024-08-09 13:21:14,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=16700.0, ans=0.125 2024-08-09 13:21:36,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=16800.0, ans=0.125 2024-08-09 13:21:41,134 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 13:21:49,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=16900.0, ans=0.04949747468305833 2024-08-09 13:21:50,264 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 13:21:52,251 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:21:54,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.579e+01 3.058e+01 4.131e+01 8.941e+01, threshold=6.115e+01, percent-clipped=7.0 2024-08-09 13:21:54,922 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1700, loss[loss=0.1419, beats_loss=0.01289, ecapa_loss=0.000802, whisper_loss=0.121, over 19922.00 frames. ], tot_loss[loss=0.1416, beats_loss=0.01412, ecapa_loss=0.0008104, whisper_loss=0.1194, over 3803004.27 frames. ], batch size: 79, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:22:07,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17000.0, ans=0.13 2024-08-09 13:22:09,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=13.875 2024-08-09 13:22:19,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=17100.0, ans=0.3015 2024-08-09 13:22:24,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=13.9125 2024-08-09 13:22:27,979 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 13:22:29,954 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:22:32,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=17200.0, ans=0.007130434782608696 2024-08-09 13:22:37,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=17200.0, ans=0.125 2024-08-09 13:22:38,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=17200.0, ans=0.125 2024-08-09 13:23:13,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1750, loss[loss=0.1413, beats_loss=0.01239, ecapa_loss=0.0008522, whisper_loss=0.1204, over 13505.00 frames. ], tot_loss[loss=0.1406, beats_loss=0.01404, ecapa_loss=0.0007966, whisper_loss=0.1186, over 3798025.70 frames. ], batch size: 56, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:23:15,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=14.0625 2024-08-09 13:23:16,632 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 35 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 13:23:27,995 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-09 13:23:29,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=14.1 2024-08-09 13:23:41,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=17700.0, ans=0.0 2024-08-09 13:24:08,240 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 13:24:20,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=17900.0, ans=0.125 2024-08-09 13:24:27,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=18000.0, ans=0.125 2024-08-09 13:24:27,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.728e+01 3.350e+01 4.234e+01 7.677e+01, threshold=6.699e+01, percent-clipped=2.0 2024-08-09 13:24:27,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1800, loss[loss=0.1607, beats_loss=0.01261, ecapa_loss=0.0006393, whisper_loss=0.1417, over 23759.00 frames. ], tot_loss[loss=0.1397, beats_loss=0.01403, ecapa_loss=0.0007803, whisper_loss=0.1178, over 3830476.88 frames. ], batch size: 91, lr: 4.44e-02, grad_scale: 32.0 2024-08-09 13:24:33,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=18000.0, ans=0.12000000000000002 2024-08-09 13:24:34,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=18000.0, ans=0.125 2024-08-09 13:24:47,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=18100.0, ans=0.0 2024-08-09 13:24:51,529 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 13:24:54,246 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 13:24:57,693 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:25:01,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=18200.0, ans=0.125 2024-08-09 13:25:05,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=18200.0, ans=0.125 2024-08-09 13:25:09,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=5.73 2024-08-09 13:25:14,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.24 vs. limit=11.32 2024-08-09 13:25:18,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=18300.0, ans=0.125 2024-08-09 13:25:21,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=5.745 2024-08-09 13:25:22,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=18300.0, ans=0.125 2024-08-09 13:25:24,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=14.3625 2024-08-09 13:25:43,244 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1850, loss[loss=0.1501, beats_loss=0.01179, ecapa_loss=0.0008284, whisper_loss=0.13, over 21276.00 frames. ], tot_loss[loss=0.1398, beats_loss=0.01392, ecapa_loss=0.0007746, whisper_loss=0.1181, over 3837773.66 frames. ], batch size: 83, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:26:01,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=18600.0, ans=0.11400000000000002 2024-08-09 13:26:03,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.90 vs. limit=21.45 2024-08-09 13:26:23,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=18700.0, ans=9.675 2024-08-09 13:26:38,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=18800.0, ans=0.006782608695652174 2024-08-09 13:26:38,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=18800.0, ans=0.07 2024-08-09 13:26:40,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=18800.0, ans=0.125 2024-08-09 13:26:51,004 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 13:27:00,434 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 13:27:03,053 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.592e+01 3.002e+01 4.008e+01 1.371e+02, threshold=6.005e+01, percent-clipped=3.0 2024-08-09 13:27:03,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1900, loss[loss=0.1371, beats_loss=0.01352, ecapa_loss=0.0007729, whisper_loss=0.1158, over 16589.00 frames. ], tot_loss[loss=0.1397, beats_loss=0.01383, ecapa_loss=0.000777, whisper_loss=0.1181, over 3832982.86 frames. ], batch size: 66, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:27:25,418 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 13:27:30,367 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 13:27:33,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=19200.0, ans=0.125 2024-08-09 13:27:43,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=19200.0, ans=0.035 2024-08-09 13:27:51,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=19300.0, ans=0.125 2024-08-09 13:27:56,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=21.975 2024-08-09 13:28:02,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.43 vs. limit=9.825 2024-08-09 13:28:08,348 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 13:28:20,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 1950, loss[loss=0.1361, beats_loss=0.01617, ecapa_loss=0.0006948, whisper_loss=0.113, over 23139.00 frames. ], tot_loss[loss=0.1394, beats_loss=0.01385, ecapa_loss=0.0007718, whisper_loss=0.1178, over 3832226.85 frames. ], batch size: 93, lr: 4.43e-02, grad_scale: 32.0 2024-08-09 13:28:41,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=19600.0, ans=0.125 2024-08-09 13:28:49,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.20 vs. limit=14.8875 2024-08-09 13:29:01,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=19700.0, ans=0.125 2024-08-09 13:29:11,346 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 13:29:31,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=19900.0, ans=0.20350000000000001 2024-08-09 13:29:33,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=11.04 vs. limit=9.975 2024-08-09 13:29:35,096 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.629e+01 3.262e+01 3.981e+01 7.661e+01, threshold=6.525e+01, percent-clipped=2.0 2024-08-09 13:29:35,115 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2000, loss[loss=0.1484, beats_loss=0.01357, ecapa_loss=0.0006846, whisper_loss=0.128, over 15316.00 frames. ], tot_loss[loss=0.1386, beats_loss=0.01387, ecapa_loss=0.0007686, whisper_loss=0.117, over 3800449.28 frames. ], batch size: 58, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:29:36,875 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-09 13:29:45,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=20000.0, ans=0.125 2024-08-09 13:30:09,604 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-08-09 13:30:25,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.43 vs. limit=22.5 2024-08-09 13:30:26,078 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-09 13:30:26,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=20300.0, ans=0.0064565217391304355 2024-08-09 13:30:27,274 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-09 13:30:29,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=20300.0, ans=0.125 2024-08-09 13:30:37,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-08-09 13:30:38,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=20400.0, ans=0.125 2024-08-09 13:30:45,907 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-09 13:30:46,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2024-08-09 13:30:54,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2050, loss[loss=0.1405, beats_loss=0.01564, ecapa_loss=0.0005804, whisper_loss=0.1191, over 22497.00 frames. ], tot_loss[loss=0.1375, beats_loss=0.01398, ecapa_loss=0.0007604, whisper_loss=0.116, over 3839777.41 frames. ], batch size: 85, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:30:59,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=20500.0, ans=0.2 2024-08-09 13:31:02,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2024-08-09 13:31:27,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.78 vs. limit=10.0 2024-08-09 13:31:40,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=20800.0, ans=0.0 2024-08-09 13:31:57,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=20900.0, ans=0.1 2024-08-09 13:32:07,537 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 13:32:08,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.818e+01 3.204e+01 4.044e+01 7.345e+01, threshold=6.407e+01, percent-clipped=1.0 2024-08-09 13:32:08,671 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2100, loss[loss=0.1514, beats_loss=0.01084, ecapa_loss=0.0007031, whisper_loss=0.1336, over 23780.00 frames. ], tot_loss[loss=0.1373, beats_loss=0.01405, ecapa_loss=0.0007445, whisper_loss=0.1158, over 3809650.99 frames. ], batch size: 89, lr: 4.42e-02, grad_scale: 64.0 2024-08-09 13:32:11,714 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 17 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 13:32:41,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.12 vs. limit=22.5 2024-08-09 13:32:59,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2024-08-09 13:33:13,532 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 13:33:25,529 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2150, loss[loss=0.124, beats_loss=0.01074, ecapa_loss=0.0008637, whisper_loss=0.1046, over 14521.00 frames. ], tot_loss[loss=0.136, beats_loss=0.0142, ecapa_loss=0.0007359, whisper_loss=0.1144, over 3831131.17 frames. ], batch size: 58, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:33:25,866 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 13:33:46,067 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:33:46,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-08-09 13:34:11,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-08-09 13:34:13,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=21800.0, ans=0.125 2024-08-09 13:34:31,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=21900.0, ans=0.2 2024-08-09 13:34:31,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.67 vs. limit=22.5 2024-08-09 13:34:42,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.673e+01 3.209e+01 4.237e+01 7.311e+01, threshold=6.417e+01, percent-clipped=1.0 2024-08-09 13:34:42,188 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2200, loss[loss=0.1032, beats_loss=0.01599, ecapa_loss=0.0006989, whisper_loss=0.08025, over 22080.00 frames. ], tot_loss[loss=0.1366, beats_loss=0.01403, ecapa_loss=0.0007366, whisper_loss=0.1152, over 3818586.75 frames. ], batch size: 92, lr: 4.41e-02, grad_scale: 64.0 2024-08-09 13:34:42,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=22000.0, ans=0.125 2024-08-09 13:34:49,859 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 13:34:51,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-08-09 13:35:08,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=22100.0, ans=0.125 2024-08-09 13:35:28,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=22300.0, ans=0.2 2024-08-09 13:35:34,354 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-09 13:35:47,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=22400.0, ans=0.125 2024-08-09 13:35:47,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=22400.0, ans=0.125 2024-08-09 13:35:55,007 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-09 13:35:55,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=22400.0, ans=0.125 2024-08-09 13:36:01,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2250, loss[loss=0.1193, beats_loss=0.01583, ecapa_loss=0.0006984, whisper_loss=0.09648, over 19838.00 frames. ], tot_loss[loss=0.1368, beats_loss=0.01398, ecapa_loss=0.0007335, whisper_loss=0.1155, over 3828442.12 frames. ], batch size: 80, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:36:05,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22500.0, ans=0.1 2024-08-09 13:36:07,012 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 13:36:22,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=22600.0, ans=0.125 2024-08-09 13:37:04,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.80 vs. limit=22.5 2024-08-09 13:37:09,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=22700.0, ans=0.125 2024-08-09 13:37:19,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=22800.0, ans=0.125 2024-08-09 13:37:20,818 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 13:37:24,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2024-08-09 13:37:35,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22900.0, ans=0.125 2024-08-09 13:37:37,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=22900.0, ans=0.2 2024-08-09 13:37:43,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-08-09 13:37:44,989 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 13:37:45,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.951e+01 3.575e+01 4.087e+01 9.473e+01, threshold=7.150e+01, percent-clipped=2.0 2024-08-09 13:37:45,967 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2300, loss[loss=0.1324, beats_loss=0.01236, ecapa_loss=0.000716, whisper_loss=0.1129, over 22788.00 frames. ], tot_loss[loss=0.1364, beats_loss=0.014, ecapa_loss=0.0007205, whisper_loss=0.1152, over 3865512.03 frames. ], batch size: 91, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:37:57,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-09 13:38:20,342 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 31 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-09 13:38:34,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=23300.0, ans=0.1 2024-08-09 13:38:48,829 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 13:38:48,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=23400.0, ans=0.035 2024-08-09 13:38:52,070 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 13:38:53,478 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 13:38:56,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=23400.0, ans=0.1 2024-08-09 13:39:04,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2350, loss[loss=0.1184, beats_loss=0.01599, ecapa_loss=0.0006295, whisper_loss=0.09616, over 22143.00 frames. ], tot_loss[loss=0.1359, beats_loss=0.0139, ecapa_loss=0.0007105, whisper_loss=0.1149, over 3839540.64 frames. ], batch size: 89, lr: 4.40e-02, grad_scale: 64.0 2024-08-09 13:39:13,392 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 13:39:39,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=23700.0, ans=0.125 2024-08-09 13:39:51,417 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-09 13:40:04,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=23800.0, ans=0.125 2024-08-09 13:40:12,565 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 13:40:20,715 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 13:40:23,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.811e+01 3.461e+01 4.504e+01 7.215e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-09 13:40:24,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2400, loss[loss=0.118, beats_loss=0.01411, ecapa_loss=0.0005703, whisper_loss=0.09816, over 20595.00 frames. ], tot_loss[loss=0.135, beats_loss=0.01396, ecapa_loss=0.0006947, whisper_loss=0.1141, over 3841579.35 frames. ], batch size: 82, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:40:31,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-09 13:40:36,549 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 13:40:51,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-08-09 13:40:53,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.82 vs. limit=22.5 2024-08-09 13:41:12,428 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-09 13:41:27,384 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 13:41:34,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.04 vs. limit=22.5 2024-08-09 13:41:36,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=24400.0, ans=0.125 2024-08-09 13:41:39,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2450, loss[loss=0.138, beats_loss=0.01525, ecapa_loss=0.0006556, whisper_loss=0.1162, over 21755.00 frames. ], tot_loss[loss=0.1342, beats_loss=0.01391, ecapa_loss=0.0006899, whisper_loss=0.1134, over 3838812.56 frames. ], batch size: 88, lr: 4.39e-02, grad_scale: 64.0 2024-08-09 13:41:39,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=24500.0, ans=0.125 2024-08-09 13:41:47,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=24500.0, ans=0.005543478260869566 2024-08-09 13:42:05,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-09 13:42:05,854 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 13:42:06,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=24600.0, ans=0.5 2024-08-09 13:42:19,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=24700.0, ans=0.125 2024-08-09 13:42:19,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=24700.0, ans=0.125 2024-08-09 13:42:35,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24800.0, ans=0.1 2024-08-09 13:42:41,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24900.0, ans=0.125 2024-08-09 13:42:43,798 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-09 13:42:52,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.830e+01 3.469e+01 4.522e+01 1.002e+02, threshold=6.939e+01, percent-clipped=2.0 2024-08-09 13:42:52,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2500, loss[loss=0.1088, beats_loss=0.01811, ecapa_loss=0.0005443, whisper_loss=0.0852, over 19184.00 frames. ], tot_loss[loss=0.1337, beats_loss=0.01386, ecapa_loss=0.0006844, whisper_loss=0.113, over 3850406.94 frames. ], batch size: 79, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:42:54,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=25000.0, ans=0.125 2024-08-09 13:43:00,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=25000.0, ans=0.125 2024-08-09 13:43:13,522 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 13:43:18,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25100.0, ans=0.1 2024-08-09 13:43:30,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=25200.0, ans=0.125 2024-08-09 13:43:41,404 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-09 13:43:42,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25300.0, ans=0.1 2024-08-09 13:43:49,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2024-08-09 13:43:52,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=25400.0, ans=0.125 2024-08-09 13:43:52,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-08-09 13:43:58,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25400.0, ans=0.1 2024-08-09 13:44:06,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=25400.0, ans=0.125 2024-08-09 13:44:07,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=25500.0, ans=0.04949747468305833 2024-08-09 13:44:08,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2550, loss[loss=0.1261, beats_loss=0.01655, ecapa_loss=0.0005845, whisper_loss=0.1037, over 23331.00 frames. ], tot_loss[loss=0.1336, beats_loss=0.01387, ecapa_loss=0.000676, whisper_loss=0.113, over 3833355.21 frames. ], batch size: 90, lr: 4.38e-02, grad_scale: 64.0 2024-08-09 13:44:12,248 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 13:44:16,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=25500.0, ans=0.125 2024-08-09 13:44:25,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25600.0, ans=0.1 2024-08-09 13:44:29,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=25600.0, ans=0.025 2024-08-09 13:44:31,765 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 13:44:35,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2024-08-09 13:44:39,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=25700.0, ans=0.005282608695652174 2024-08-09 13:44:50,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=12.0 2024-08-09 13:44:52,961 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 13:45:01,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=25800.0, ans=0.125 2024-08-09 13:45:12,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=25900.0, ans=0.0 2024-08-09 13:45:14,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=12.0 2024-08-09 13:45:18,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=25900.0, ans=0.2 2024-08-09 13:45:21,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 3.019e+01 3.579e+01 4.793e+01 1.038e+02, threshold=7.158e+01, percent-clipped=5.0 2024-08-09 13:45:21,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2600, loss[loss=0.1316, beats_loss=0.01434, ecapa_loss=0.000617, whisper_loss=0.1111, over 22718.00 frames. ], tot_loss[loss=0.1334, beats_loss=0.0139, ecapa_loss=0.0006639, whisper_loss=0.1129, over 3849491.62 frames. ], batch size: 90, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:45:30,005 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 13:45:40,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2024-08-09 13:46:27,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=26400.0, ans=0.125 2024-08-09 13:46:29,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=26400.0, ans=0.0 2024-08-09 13:46:34,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2650, loss[loss=0.1498, beats_loss=0.01113, ecapa_loss=0.0006534, whisper_loss=0.1321, over 22742.00 frames. ], tot_loss[loss=0.1337, beats_loss=0.01373, ecapa_loss=0.0006584, whisper_loss=0.1134, over 3862074.30 frames. ], batch size: 91, lr: 4.37e-02, grad_scale: 64.0 2024-08-09 13:46:53,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26600.0, ans=0.1 2024-08-09 13:46:59,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=26600.0, ans=0.125 2024-08-09 13:47:01,026 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 13:47:03,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26700.0, ans=0.1 2024-08-09 13:47:04,986 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 13:47:21,254 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-09 13:47:33,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=26900.0, ans=0.125 2024-08-09 13:47:38,690 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 13:47:43,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=26900.0, ans=0.2 2024-08-09 13:47:47,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.863e+01 3.296e+01 3.949e+01 7.406e+01, threshold=6.593e+01, percent-clipped=2.0 2024-08-09 13:47:47,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2700, loss[loss=0.1351, beats_loss=0.008254, ecapa_loss=0.0006179, whisper_loss=0.1206, over 15584.00 frames. ], tot_loss[loss=0.1335, beats_loss=0.01382, ecapa_loss=0.0006494, whisper_loss=0.1132, over 3894986.64 frames. ], batch size: 56, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:47:47,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=27000.0, ans=0.125 2024-08-09 13:47:59,198 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 13:47:59,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=27000.0, ans=0.2 2024-08-09 13:48:03,848 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 13:48:52,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=27400.0, ans=0.95 2024-08-09 13:49:01,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2750, loss[loss=0.1235, beats_loss=0.01359, ecapa_loss=0.0006291, whisper_loss=0.1036, over 23042.00 frames. ], tot_loss[loss=0.1328, beats_loss=0.01384, ecapa_loss=0.0006406, whisper_loss=0.1126, over 3878459.29 frames. ], batch size: 95, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:49:03,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=27500.0, ans=0.125 2024-08-09 13:49:17,822 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 13:49:19,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2024-08-09 13:49:22,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=27600.0, ans=0.125 2024-08-09 13:49:34,807 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 13:50:13,619 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 13:50:19,592 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.874e+01 3.420e+01 4.195e+01 6.815e+01, threshold=6.839e+01, percent-clipped=2.0 2024-08-09 13:50:19,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2800, loss[loss=0.1028, beats_loss=0.01466, ecapa_loss=0.0007166, whisper_loss=0.08097, over 16785.00 frames. ], tot_loss[loss=0.1327, beats_loss=0.01376, ecapa_loss=0.0006379, whisper_loss=0.1125, over 3849747.10 frames. ], batch size: 70, lr: 4.36e-02, grad_scale: 64.0 2024-08-09 13:50:21,448 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-09 13:50:27,720 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 13:50:44,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2024-08-09 13:50:56,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=28200.0, ans=0.0 2024-08-09 13:51:01,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28200.0, ans=0.1 2024-08-09 13:51:19,361 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-09 13:51:22,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=12.0 2024-08-09 13:51:31,841 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 45 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 13:51:38,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2850, loss[loss=0.1295, beats_loss=0.01574, ecapa_loss=0.0005672, whisper_loss=0.1081, over 17864.00 frames. ], tot_loss[loss=0.1326, beats_loss=0.01388, ecapa_loss=0.0006277, whisper_loss=0.1124, over 3845578.19 frames. ], batch size: 71, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:51:42,151 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 13:51:42,417 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.285e-02 2024-08-09 13:51:43,476 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-09 13:51:51,392 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 13:51:59,774 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 13:52:00,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=28600.0, ans=0.004652173913043478 2024-08-09 13:52:00,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-08-09 13:52:01,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=28600.0, ans=0.125 2024-08-09 13:52:24,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=18.26 vs. limit=15.0 2024-08-09 13:52:27,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-09 13:52:59,696 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 13:53:00,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 3.002e+01 3.706e+01 4.572e+01 7.980e+01, threshold=7.411e+01, percent-clipped=5.0 2024-08-09 13:53:00,781 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2900, loss[loss=0.1117, beats_loss=0.01683, ecapa_loss=0.000526, whisper_loss=0.08964, over 14224.00 frames. ], tot_loss[loss=0.1325, beats_loss=0.01388, ecapa_loss=0.000631, whisper_loss=0.1123, over 3867940.62 frames. ], batch size: 57, lr: 4.35e-02, grad_scale: 64.0 2024-08-09 13:53:00,987 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-09 13:53:15,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=29100.0, ans=0.125 2024-08-09 13:53:18,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=29100.0, ans=0.0 2024-08-09 13:53:26,386 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 13:53:26,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=29100.0, ans=0.125 2024-08-09 13:53:30,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=29200.0, ans=0.125 2024-08-09 13:53:40,422 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 13:53:40,708 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 13:53:43,155 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 13:54:19,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=15.0 2024-08-09 13:54:19,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 2950, loss[loss=0.1503, beats_loss=0.009029, ecapa_loss=0.0009197, whisper_loss=0.132, over 15245.00 frames. ], tot_loss[loss=0.1321, beats_loss=0.01384, ecapa_loss=0.0006292, whisper_loss=0.1119, over 3846906.85 frames. ], batch size: 66, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:54:25,038 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 13:54:31,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=29500.0, ans=0.2 2024-08-09 13:54:36,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29600.0, ans=0.1 2024-08-09 13:54:36,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.85 vs. limit=22.5 2024-08-09 13:54:47,284 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 13:54:47,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=29600.0, ans=0.125 2024-08-09 13:55:21,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=29900.0, ans=0.2 2024-08-09 13:55:22,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-08-09 13:55:34,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2024-08-09 13:55:35,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=29900.0, ans=0.2 2024-08-09 13:55:39,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 3.111e+01 3.701e+01 4.234e+01 7.297e+01, threshold=7.402e+01, percent-clipped=0.0 2024-08-09 13:55:39,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3000, loss[loss=0.1123, beats_loss=0.01131, ecapa_loss=0.0006481, whisper_loss=0.09448, over 17247.00 frames. ], tot_loss[loss=0.1318, beats_loss=0.01393, ecapa_loss=0.0006209, whisper_loss=0.1116, over 3865299.35 frames. ], batch size: 69, lr: 4.34e-02, grad_scale: 64.0 2024-08-09 13:55:39,240 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 13:56:23,638 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on ASR_libri: loss=0.3107, beats_loss=0, ecapa_loss=0.001585, whisper_loss=0.2948, over 922467.00 frames. 2024-08-09 13:56:41,609 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on SV_voxceleb1: loss=0.0159, beats_loss=0, ecapa_loss=0.00159, whisper_loss=0, over 939242.00 frames. 2024-08-09 13:58:39,709 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on AT_audioset: loss=0.03327, beats_loss=0.03327, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 13:58:39,713 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 13:58:45,282 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.370e+01 2024-08-09 13:58:45,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-09 13:58:47,869 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-09 13:59:02,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=30100.0, ans=0.0 2024-08-09 13:59:06,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=30100.0, ans=0.0 2024-08-09 13:59:12,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=30200.0, ans=0.07 2024-08-09 13:59:37,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2024-08-09 13:59:42,835 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 13:59:44,284 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 13:59:49,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=30400.0, ans=0.2 2024-08-09 13:59:58,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=30400.0, ans=0.05 2024-08-09 14:00:02,703 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 14:00:03,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=30500.0, ans=0.004239130434782609 2024-08-09 14:00:04,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3050, loss[loss=0.1308, beats_loss=0.01224, ecapa_loss=0.0006513, whisper_loss=0.1121, over 16669.00 frames. ], tot_loss[loss=0.1316, beats_loss=0.01392, ecapa_loss=0.0006151, whisper_loss=0.1115, over 3875830.32 frames. ], batch size: 67, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:00:38,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2024-08-09 14:01:01,129 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-09 14:01:11,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=30900.0, ans=0.125 2024-08-09 14:01:17,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31000.0, ans=0.1 2024-08-09 14:01:18,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 3.101e+01 3.734e+01 4.761e+01 9.232e+01, threshold=7.468e+01, percent-clipped=3.0 2024-08-09 14:01:18,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3100, loss[loss=0.1435, beats_loss=0.01473, ecapa_loss=0.0006188, whisper_loss=0.1226, over 17693.00 frames. ], tot_loss[loss=0.1318, beats_loss=0.01388, ecapa_loss=0.0006099, whisper_loss=0.1118, over 3887914.96 frames. ], batch size: 72, lr: 4.33e-02, grad_scale: 64.0 2024-08-09 14:01:20,627 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 14:01:23,275 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 14:01:26,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-09 14:01:27,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=31000.0, ans=0.125 2024-08-09 14:01:30,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31100.0, ans=0.125 2024-08-09 14:01:35,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31100.0, ans=0.125 2024-08-09 14:01:47,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=12.0 2024-08-09 14:02:16,376 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 14:02:23,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3150, loss[loss=0.1166, beats_loss=0.01516, ecapa_loss=0.0005187, whisper_loss=0.09624, over 19730.00 frames. ], tot_loss[loss=0.1319, beats_loss=0.01387, ecapa_loss=0.0006072, whisper_loss=0.112, over 3858166.06 frames. ], batch size: 79, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:02:41,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=31600.0, ans=0.0 2024-08-09 14:02:44,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31600.0, ans=0.1 2024-08-09 14:02:46,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=31600.0, ans=0.125 2024-08-09 14:03:06,498 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 31 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-09 14:03:06,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=31800.0, ans=0.2 2024-08-09 14:03:25,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=31900.0, ans=0.0 2024-08-09 14:03:30,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 3.005e+01 3.440e+01 4.161e+01 7.835e+01, threshold=6.880e+01, percent-clipped=1.0 2024-08-09 14:03:30,644 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3200, loss[loss=0.1147, beats_loss=0.01686, ecapa_loss=0.000621, whisper_loss=0.09158, over 17386.00 frames. ], tot_loss[loss=0.1316, beats_loss=0.01384, ecapa_loss=0.0006027, whisper_loss=0.1117, over 3852264.11 frames. ], batch size: 74, lr: 4.32e-02, grad_scale: 64.0 2024-08-09 14:03:49,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=32100.0, ans=0.125 2024-08-09 14:03:51,688 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 14:04:14,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=32300.0, ans=0.0 2024-08-09 14:04:18,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32300.0, ans=0.1 2024-08-09 14:04:22,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.30 vs. limit=22.5 2024-08-09 14:04:36,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3250, loss[loss=0.1656, beats_loss=0.0128, ecapa_loss=0.0005304, whisper_loss=0.1474, over 24191.00 frames. ], tot_loss[loss=0.1314, beats_loss=0.01378, ecapa_loss=0.0005997, whisper_loss=0.1116, over 3822099.15 frames. ], batch size: 90, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:05:04,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=32700.0, ans=0.0 2024-08-09 14:05:07,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=32700.0, ans=0.125 2024-08-09 14:05:09,623 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:05:11,975 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 33 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 14:05:24,253 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 14:05:28,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=32900.0, ans=0.125 2024-08-09 14:05:36,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=12.56 vs. limit=10.0 2024-08-09 14:05:42,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.060e+01 3.523e+01 4.253e+01 9.588e+01, threshold=7.047e+01, percent-clipped=8.0 2024-08-09 14:05:42,319 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3300, loss[loss=0.1148, beats_loss=0.01548, ecapa_loss=0.0004872, whisper_loss=0.09444, over 21522.00 frames. ], tot_loss[loss=0.1308, beats_loss=0.01376, ecapa_loss=0.0005955, whisper_loss=0.1111, over 3822835.20 frames. ], batch size: 83, lr: 4.31e-02, grad_scale: 64.0 2024-08-09 14:05:46,457 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 14:06:02,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33100.0, ans=0.1 2024-08-09 14:06:06,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33100.0, ans=0.1 2024-08-09 14:06:12,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=33200.0, ans=0.0 2024-08-09 14:06:27,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=33300.0, ans=0.003630434782608696 2024-08-09 14:06:40,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-09 14:06:40,871 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 14:06:43,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2024-08-09 14:06:44,733 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 14:06:46,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=33500.0, ans=0.125 2024-08-09 14:06:47,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3350, loss[loss=0.1321, beats_loss=0.01271, ecapa_loss=0.000499, whisper_loss=0.1144, over 19934.00 frames. ], tot_loss[loss=0.1298, beats_loss=0.01375, ecapa_loss=0.0005873, whisper_loss=0.1101, over 3842290.33 frames. ], batch size: 75, lr: 4.30e-02, grad_scale: 64.0 2024-08-09 14:07:05,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=33600.0, ans=0.2 2024-08-09 14:07:10,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=33600.0, ans=0.0 2024-08-09 14:07:25,016 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 14:07:36,282 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-09 14:07:46,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=33900.0, ans=0.0 2024-08-09 14:07:53,333 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 3.123e+01 3.529e+01 4.678e+01 1.147e+02, threshold=7.058e+01, percent-clipped=6.0 2024-08-09 14:07:53,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3400, loss[loss=0.1388, beats_loss=0.01147, ecapa_loss=0.0004797, whisper_loss=0.1226, over 18429.00 frames. ], tot_loss[loss=0.1295, beats_loss=0.01372, ecapa_loss=0.000581, whisper_loss=0.1099, over 3872639.65 frames. ], batch size: 68, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:08:10,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=34100.0, ans=0.1 2024-08-09 14:08:33,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=34300.0, ans=0.125 2024-08-09 14:08:52,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-08-09 14:08:56,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-09 14:08:57,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3450, loss[loss=0.1229, beats_loss=0.01443, ecapa_loss=0.0005794, whisper_loss=0.1026, over 22670.00 frames. ], tot_loss[loss=0.1294, beats_loss=0.01364, ecapa_loss=0.000583, whisper_loss=0.1099, over 3879347.11 frames. ], batch size: 93, lr: 4.29e-02, grad_scale: 64.0 2024-08-09 14:08:58,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=34500.0, ans=0.125 2024-08-09 14:09:02,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=34500.0, ans=0.04949747468305833 2024-08-09 14:09:04,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.28 vs. limit=22.5 2024-08-09 14:09:12,075 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-09 14:09:26,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34700.0, ans=0.1 2024-08-09 14:09:32,908 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 14:09:42,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=34800.0, ans=0.0033043478260869567 2024-08-09 14:09:48,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=34900.0, ans=0.125 2024-08-09 14:09:48,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=15.0 2024-08-09 14:10:02,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.921e+01 3.468e+01 4.313e+01 8.519e+01, threshold=6.936e+01, percent-clipped=1.0 2024-08-09 14:10:02,716 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3500, loss[loss=0.1584, beats_loss=0.0111, ecapa_loss=0.0005288, whisper_loss=0.142, over 17238.00 frames. ], tot_loss[loss=0.129, beats_loss=0.01376, ecapa_loss=0.000576, whisper_loss=0.1095, over 3868495.63 frames. ], batch size: 63, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:10:09,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=35000.0, ans=0.0 2024-08-09 14:10:16,068 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 14:10:24,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35100.0, ans=0.1 2024-08-09 14:10:46,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35300.0, ans=0.125 2024-08-09 14:10:53,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2024-08-09 14:10:59,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=35400.0, ans=0.125 2024-08-09 14:11:07,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3550, loss[loss=0.1401, beats_loss=0.01247, ecapa_loss=0.0006417, whisper_loss=0.1212, over 14842.00 frames. ], tot_loss[loss=0.129, beats_loss=0.01365, ecapa_loss=0.0005737, whisper_loss=0.1096, over 3857998.61 frames. ], batch size: 61, lr: 4.28e-02, grad_scale: 64.0 2024-08-09 14:11:10,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35500.0, ans=0.1 2024-08-09 14:11:11,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-09 14:11:12,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=35500.0, ans=0.0 2024-08-09 14:11:16,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35500.0, ans=0.1 2024-08-09 14:11:21,330 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.060e+00 2024-08-09 14:11:33,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.84 vs. limit=22.5 2024-08-09 14:11:35,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35700.0, ans=0.1 2024-08-09 14:11:46,831 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 14:11:51,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35800.0, ans=0.125 2024-08-09 14:12:02,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=35900.0, ans=0.125 2024-08-09 14:12:12,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=36000.0, ans=0.125 2024-08-09 14:12:13,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 3.146e+01 3.821e+01 4.721e+01 1.022e+02, threshold=7.642e+01, percent-clipped=5.0 2024-08-09 14:12:13,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3600, loss[loss=0.1267, beats_loss=0.0162, ecapa_loss=0.000505, whisper_loss=0.1054, over 19901.00 frames. ], tot_loss[loss=0.1286, beats_loss=0.0137, ecapa_loss=0.0005682, whisper_loss=0.1092, over 3865674.86 frames. ], batch size: 79, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:12:29,535 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-09 14:12:38,760 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-09 14:12:48,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=36200.0, ans=0.0 2024-08-09 14:12:50,645 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 14:12:50,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=36200.0, ans=0.0 2024-08-09 14:12:54,756 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 14:12:56,088 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 14:13:12,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2024-08-09 14:13:19,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3650, loss[loss=0.1386, beats_loss=0.01464, ecapa_loss=0.0005222, whisper_loss=0.1187, over 20499.00 frames. ], tot_loss[loss=0.1289, beats_loss=0.01379, ecapa_loss=0.000564, whisper_loss=0.1095, over 3887837.16 frames. ], batch size: 81, lr: 4.27e-02, grad_scale: 64.0 2024-08-09 14:13:23,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=36500.0, ans=0.0 2024-08-09 14:13:25,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=36500.0, ans=0.2 2024-08-09 14:13:35,853 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 14:13:38,597 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-09 14:13:49,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=36700.0, ans=0.0 2024-08-09 14:13:58,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.46 vs. limit=22.5 2024-08-09 14:14:02,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-09 14:14:09,309 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 14:14:11,965 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 14:14:24,857 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.925e+01 3.373e+01 4.021e+01 6.000e+01, threshold=6.747e+01, percent-clipped=0.0 2024-08-09 14:14:24,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3700, loss[loss=0.1523, beats_loss=0.01282, ecapa_loss=0.0005138, whisper_loss=0.1344, over 14891.00 frames. ], tot_loss[loss=0.1289, beats_loss=0.01379, ecapa_loss=0.0005619, whisper_loss=0.1095, over 3899671.45 frames. ], batch size: 57, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:14:27,850 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 14:14:32,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=37000.0, ans=0.04949747468305833 2024-08-09 14:14:32,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.23 vs. limit=6.0 2024-08-09 14:14:49,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.29 vs. limit=22.5 2024-08-09 14:14:59,803 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 16 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 14:15:07,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37300.0, ans=0.1 2024-08-09 14:15:15,081 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 14:15:19,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=37400.0, ans=0.0 2024-08-09 14:15:20,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=37400.0, ans=0.2 2024-08-09 14:15:25,174 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 14:15:30,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3750, loss[loss=0.149, beats_loss=0.01317, ecapa_loss=0.000549, whisper_loss=0.1304, over 19770.00 frames. ], tot_loss[loss=0.1286, beats_loss=0.01384, ecapa_loss=0.0005563, whisper_loss=0.1092, over 3882970.83 frames. ], batch size: 79, lr: 4.26e-02, grad_scale: 64.0 2024-08-09 14:15:33,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=15.0 2024-08-09 14:15:37,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37500.0, ans=0.1 2024-08-09 14:15:37,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=37500.0, ans=0.002717391304347826 2024-08-09 14:15:42,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=37600.0, ans=0.125 2024-08-09 14:16:20,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=37800.0, ans=0.125 2024-08-09 14:16:33,640 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-09 14:16:35,482 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-09 14:16:36,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.197e+01 3.801e+01 4.581e+01 9.571e+01, threshold=7.603e+01, percent-clipped=5.0 2024-08-09 14:16:36,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3800, loss[loss=0.09747, beats_loss=0.01435, ecapa_loss=0.0005206, whisper_loss=0.07791, over 14622.00 frames. ], tot_loss[loss=0.1285, beats_loss=0.01393, ecapa_loss=0.0005581, whisper_loss=0.109, over 3885430.24 frames. ], batch size: 55, lr: 4.25e-02, grad_scale: 64.0 2024-08-09 14:16:36,787 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 14:16:42,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-08-09 14:17:08,045 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 14:17:16,573 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.748e-02 2024-08-09 14:17:21,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=38300.0, ans=0.1 2024-08-09 14:17:29,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=38400.0, ans=0.2 2024-08-09 14:17:36,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38400.0, ans=0.1 2024-08-09 14:17:41,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3850, loss[loss=0.1513, beats_loss=0.01006, ecapa_loss=0.0006238, whisper_loss=0.135, over 15246.00 frames. ], tot_loss[loss=0.1278, beats_loss=0.01394, ecapa_loss=0.0005551, whisper_loss=0.1083, over 3872030.55 frames. ], batch size: 60, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:17:42,148 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 14:17:42,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=38500.0, ans=0.125 2024-08-09 14:17:42,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=38500.0, ans=0.0025000000000000005 2024-08-09 14:17:51,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-09 14:17:52,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=38500.0, ans=0.125 2024-08-09 14:17:58,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=38600.0, ans=0.125 2024-08-09 14:18:01,024 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 14:18:18,512 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 14:18:18,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=12.0 2024-08-09 14:18:49,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+01 3.021e+01 3.699e+01 4.570e+01 7.428e+01, threshold=7.398e+01, percent-clipped=0.0 2024-08-09 14:18:49,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3900, loss[loss=0.1359, beats_loss=0.01325, ecapa_loss=0.0005212, whisper_loss=0.1174, over 17192.00 frames. ], tot_loss[loss=0.1281, beats_loss=0.01387, ecapa_loss=0.0005546, whisper_loss=0.1086, over 3864113.37 frames. ], batch size: 68, lr: 4.24e-02, grad_scale: 64.0 2024-08-09 14:18:51,909 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-09 14:18:53,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=39000.0, ans=0.2 2024-08-09 14:19:00,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.51 vs. limit=22.5 2024-08-09 14:19:13,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=39200.0, ans=0.05 2024-08-09 14:19:16,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39200.0, ans=0.1 2024-08-09 14:19:25,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=39200.0, ans=0.125 2024-08-09 14:19:38,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=39300.0, ans=0.125 2024-08-09 14:19:42,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=39400.0, ans=0.125 2024-08-09 14:19:43,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-08-09 14:19:50,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-09 14:19:52,303 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 14:19:53,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 3950, loss[loss=0.1201, beats_loss=0.01471, ecapa_loss=0.0005408, whisper_loss=0.09994, over 21693.00 frames. ], tot_loss[loss=0.1287, beats_loss=0.01378, ecapa_loss=0.0005524, whisper_loss=0.1094, over 3876367.02 frames. ], batch size: 89, lr: 4.23e-02, grad_scale: 64.0 2024-08-09 14:19:58,033 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 14:20:07,534 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 14:20:21,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=39600.0, ans=0.95 2024-08-09 14:20:28,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=39700.0, ans=10.0 2024-08-09 14:20:31,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.96 vs. limit=10.0 2024-08-09 14:20:32,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.22 vs. limit=6.0 2024-08-09 14:20:37,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-09 14:20:43,757 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 14:20:52,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=39800.0, ans=0.125 2024-08-09 14:20:52,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2024-08-09 14:20:58,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=39900.0, ans=0.0 2024-08-09 14:20:59,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=39900.0, ans=0.0 2024-08-09 14:20:59,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=39900.0, ans=0.1 2024-08-09 14:21:05,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.19 vs. limit=15.0 2024-08-09 14:21:08,115 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-4000.pt 2024-08-09 14:21:12,627 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.104e+01 3.769e+01 4.628e+01 7.300e+01, threshold=7.538e+01, percent-clipped=0.0 2024-08-09 14:21:12,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4000, loss[loss=0.09288, beats_loss=0.01482, ecapa_loss=0.00055, whisper_loss=0.07256, over 14670.00 frames. ], tot_loss[loss=0.1283, beats_loss=0.01375, ecapa_loss=0.0005534, whisper_loss=0.109, over 3875945.84 frames. ], batch size: 61, lr: 4.23e-02, grad_scale: 128.0 2024-08-09 14:21:14,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=29.72 vs. limit=15.0 2024-08-09 14:21:37,496 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 14:21:44,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=40200.0, ans=0.2 2024-08-09 14:21:49,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=40200.0, ans=0.125 2024-08-09 14:22:15,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=40400.0, ans=0.2 2024-08-09 14:22:20,350 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-09 14:22:23,202 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4050, loss[loss=0.1331, beats_loss=0.01033, ecapa_loss=0.0004849, whisper_loss=0.1179, over 20136.00 frames. ], tot_loss[loss=0.1284, beats_loss=0.01367, ecapa_loss=0.0005456, whisper_loss=0.1093, over 3853464.20 frames. ], batch size: 75, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:22:28,768 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 14:22:29,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=40500.0, ans=0.0 2024-08-09 14:22:34,088 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 23 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-09 14:22:41,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-09 14:22:41,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.02 vs. limit=10.0 2024-08-09 14:22:46,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=40600.0, ans=0.125 2024-08-09 14:22:53,329 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 14:23:03,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=40800.0, ans=0.0 2024-08-09 14:23:10,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=40800.0, ans=0.0 2024-08-09 14:23:14,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=40900.0, ans=0.125 2024-08-09 14:23:19,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=40900.0, ans=0.125 2024-08-09 14:23:28,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.975e+01 3.511e+01 4.257e+01 6.601e+01, threshold=7.021e+01, percent-clipped=0.0 2024-08-09 14:23:28,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4100, loss[loss=0.1011, beats_loss=0.01633, ecapa_loss=0.0005022, whisper_loss=0.07971, over 21077.00 frames. ], tot_loss[loss=0.1286, beats_loss=0.01364, ecapa_loss=0.0005433, whisper_loss=0.1096, over 3867604.80 frames. ], batch size: 90, lr: 4.22e-02, grad_scale: 128.0 2024-08-09 14:23:28,703 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 14:23:49,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=41100.0, ans=0.125 2024-08-09 14:23:52,233 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 14:24:11,715 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 14:24:12,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=41300.0, ans=0.125 2024-08-09 14:24:33,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4150, loss[loss=0.1087, beats_loss=0.0144, ecapa_loss=0.0005484, whisper_loss=0.08879, over 16498.00 frames. ], tot_loss[loss=0.1285, beats_loss=0.01364, ecapa_loss=0.0005369, whisper_loss=0.1095, over 3874629.74 frames. ], batch size: 66, lr: 4.21e-02, grad_scale: 128.0 2024-08-09 14:24:40,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=41500.0, ans=0.125 2024-08-09 14:24:43,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=41500.0, ans=0.0 2024-08-09 14:24:47,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=41600.0, ans=0.2 2024-08-09 14:25:08,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-08-09 14:25:12,221 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 14:25:13,521 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-09 14:25:15,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=41800.0, ans=0.0 2024-08-09 14:25:16,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=41800.0, ans=0.0017826086956521745 2024-08-09 14:25:18,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41800.0, ans=0.1 2024-08-09 14:25:19,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=41800.0, ans=0.0 2024-08-09 14:25:20,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=41800.0, ans=0.125 2024-08-09 14:25:20,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-08-09 14:25:28,836 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 14:25:31,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=41900.0, ans=0.0 2024-08-09 14:25:37,463 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.939e+01 3.388e+01 4.308e+01 6.716e+01, threshold=6.777e+01, percent-clipped=0.0 2024-08-09 14:25:37,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4200, loss[loss=0.1318, beats_loss=0.01347, ecapa_loss=0.0004589, whisper_loss=0.1137, over 21103.00 frames. ], tot_loss[loss=0.1282, beats_loss=0.01367, ecapa_loss=0.0005341, whisper_loss=0.1092, over 3907912.21 frames. ], batch size: 82, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:25:44,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42000.0, ans=0.1 2024-08-09 14:25:46,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=42000.0, ans=0.125 2024-08-09 14:25:52,048 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-09 14:25:54,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=42100.0, ans=0.125 2024-08-09 14:25:57,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.64 vs. limit=22.5 2024-08-09 14:26:13,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=42200.0, ans=0.2 2024-08-09 14:26:15,555 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 14:26:19,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=42300.0, ans=0.125 2024-08-09 14:26:24,336 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-09 14:26:30,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=15.0 2024-08-09 14:26:30,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2024-08-09 14:26:30,843 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-09 14:26:38,223 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 14:26:41,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4250, loss[loss=0.1319, beats_loss=0.01448, ecapa_loss=0.0004991, whisper_loss=0.1124, over 23493.00 frames. ], tot_loss[loss=0.1286, beats_loss=0.01363, ecapa_loss=0.0005306, whisper_loss=0.1097, over 3916670.35 frames. ], batch size: 92, lr: 4.20e-02, grad_scale: 128.0 2024-08-09 14:26:59,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=42600.0, ans=0.125 2024-08-09 14:27:08,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=42700.0, ans=10.0 2024-08-09 14:27:11,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=42700.0, ans=0.0 2024-08-09 14:27:16,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.52 vs. limit=22.5 2024-08-09 14:27:27,427 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 14:27:29,824 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-09 14:27:35,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=42900.0, ans=0.125 2024-08-09 14:27:37,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2024-08-09 14:27:46,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 3.006e+01 3.697e+01 4.408e+01 8.760e+01, threshold=7.393e+01, percent-clipped=1.0 2024-08-09 14:27:46,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4300, loss[loss=0.1167, beats_loss=0.01381, ecapa_loss=0.0006251, whisper_loss=0.0966, over 13392.00 frames. ], tot_loss[loss=0.1285, beats_loss=0.0136, ecapa_loss=0.0005291, whisper_loss=0.1096, over 3898803.90 frames. ], batch size: 60, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:28:00,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=43100.0, ans=0.2 2024-08-09 14:28:08,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=43100.0, ans=0.125 2024-08-09 14:28:08,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=43100.0, ans=0.05 2024-08-09 14:28:10,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=43100.0, ans=0.125 2024-08-09 14:28:16,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-09 14:28:49,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=43400.0, ans=0.125 2024-08-09 14:28:50,627 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 14:28:51,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4350, loss[loss=0.1274, beats_loss=0.01257, ecapa_loss=0.0005332, whisper_loss=0.1095, over 21936.00 frames. ], tot_loss[loss=0.1277, beats_loss=0.01355, ecapa_loss=0.000528, whisper_loss=0.1088, over 3884342.09 frames. ], batch size: 90, lr: 4.19e-02, grad_scale: 128.0 2024-08-09 14:28:55,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=43500.0, ans=0.0 2024-08-09 14:29:00,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=43500.0, ans=0.2 2024-08-09 14:29:05,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=43600.0, ans=0.0 2024-08-09 14:29:16,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=43700.0, ans=0.02 2024-08-09 14:29:41,108 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-09 14:29:50,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=15.0 2024-08-09 14:29:57,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.950e+01 3.412e+01 4.173e+01 7.476e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-09 14:29:57,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4400, loss[loss=0.1104, beats_loss=0.01613, ecapa_loss=0.0004684, whisper_loss=0.08963, over 22266.00 frames. ], tot_loss[loss=0.127, beats_loss=0.01356, ecapa_loss=0.0005268, whisper_loss=0.1082, over 3880921.86 frames. ], batch size: 90, lr: 4.18e-02, grad_scale: 128.0 2024-08-09 14:30:06,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=44000.0, ans=0.125 2024-08-09 14:30:08,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=44000.0, ans=0.0 2024-08-09 14:30:14,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=44100.0, ans=0.2 2024-08-09 14:30:44,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=44200.0, ans=0.2 2024-08-09 14:30:47,407 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 14:31:05,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.35 vs. limit=22.5 2024-08-09 14:31:10,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2024-08-09 14:31:17,285 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 36 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 14:31:19,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2024-08-09 14:31:22,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4450, loss[loss=0.1284, beats_loss=0.01366, ecapa_loss=0.0004394, whisper_loss=0.1103, over 22586.00 frames. ], tot_loss[loss=0.1266, beats_loss=0.01359, ecapa_loss=0.0005285, whisper_loss=0.1077, over 3877718.04 frames. ], batch size: 91, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:31:27,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=44500.0, ans=0.125 2024-08-09 14:31:52,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=12.0 2024-08-09 14:32:07,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=44700.0, ans=0.125 2024-08-09 14:32:35,512 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 14:32:35,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=44900.0, ans=0.125 2024-08-09 14:32:48,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+01 3.039e+01 3.733e+01 4.656e+01 8.279e+01, threshold=7.465e+01, percent-clipped=2.0 2024-08-09 14:32:48,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4500, loss[loss=0.1466, beats_loss=0.01134, ecapa_loss=0.000562, whisper_loss=0.1296, over 23052.00 frames. ], tot_loss[loss=0.1269, beats_loss=0.01358, ecapa_loss=0.0005232, whisper_loss=0.1081, over 3882932.97 frames. ], batch size: 89, lr: 4.17e-02, grad_scale: 128.0 2024-08-09 14:32:52,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=45000.0, ans=0.125 2024-08-09 14:33:04,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-09 14:33:06,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=45100.0, ans=0.125 2024-08-09 14:33:16,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=45100.0, ans=0.05 2024-08-09 14:33:24,666 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 14:33:30,428 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.188e+00 2024-08-09 14:33:48,563 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 14:34:06,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=45400.0, ans=0.0010000000000000009 2024-08-09 14:34:07,944 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 14:34:11,147 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4550, loss[loss=0.1346, beats_loss=0.01276, ecapa_loss=0.0005144, whisper_loss=0.1167, over 22707.00 frames. ], tot_loss[loss=0.1273, beats_loss=0.01357, ecapa_loss=0.0005195, whisper_loss=0.1085, over 3885061.56 frames. ], batch size: 89, lr: 4.16e-02, grad_scale: 128.0 2024-08-09 14:34:11,280 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 14:34:12,555 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 14:34:26,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45600.0, ans=0.1 2024-08-09 14:34:38,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=15.0 2024-08-09 14:34:49,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=45700.0, ans=0.125 2024-08-09 14:34:50,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=45700.0, ans=0.5 2024-08-09 14:35:06,006 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 14:35:22,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=45900.0, ans=0.1 2024-08-09 14:35:32,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.954e+01 3.369e+01 4.036e+01 7.171e+01, threshold=6.737e+01, percent-clipped=0.0 2024-08-09 14:35:32,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4600, loss[loss=0.1424, beats_loss=0.01391, ecapa_loss=0.0004524, whisper_loss=0.124, over 18472.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.01366, ecapa_loss=0.0005188, whisper_loss=0.1076, over 3869373.04 frames. ], batch size: 70, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:35:36,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=46000.0, ans=0.0 2024-08-09 14:35:43,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2024-08-09 14:36:06,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2024-08-09 14:36:28,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=46300.0, ans=0.125 2024-08-09 14:36:31,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.24 vs. limit=22.5 2024-08-09 14:36:36,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=46400.0, ans=0.125 2024-08-09 14:36:54,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4650, loss[loss=0.1299, beats_loss=0.01543, ecapa_loss=0.0004605, whisper_loss=0.1099, over 21520.00 frames. ], tot_loss[loss=0.1269, beats_loss=0.01354, ecapa_loss=0.0005207, whisper_loss=0.1081, over 3861735.77 frames. ], batch size: 86, lr: 4.15e-02, grad_scale: 128.0 2024-08-09 14:37:00,380 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 14:37:09,165 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 14:37:28,648 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-09 14:37:32,656 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-09 14:37:40,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46700.0, ans=0.1 2024-08-09 14:37:49,313 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-09 14:37:49,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=46800.0, ans=0.125 2024-08-09 14:38:00,290 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 14:38:17,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 3.046e+01 3.609e+01 4.617e+01 7.306e+01, threshold=7.217e+01, percent-clipped=2.0 2024-08-09 14:38:17,900 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4700, loss[loss=0.1283, beats_loss=0.01155, ecapa_loss=0.0007129, whisper_loss=0.1097, over 20805.00 frames. ], tot_loss[loss=0.1277, beats_loss=0.01351, ecapa_loss=0.000518, whisper_loss=0.109, over 3854751.98 frames. ], batch size: 92, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:38:41,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=47100.0, ans=0.2 2024-08-09 14:38:47,549 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-09 14:39:00,918 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-09 14:39:01,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=47200.0, ans=0.125 2024-08-09 14:39:09,905 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 14:39:38,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-08-09 14:39:43,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4750, loss[loss=0.1019, beats_loss=0.01503, ecapa_loss=0.00042, whisper_loss=0.08267, over 19921.00 frames. ], tot_loss[loss=0.1274, beats_loss=0.01363, ecapa_loss=0.000516, whisper_loss=0.1086, over 3889372.50 frames. ], batch size: 79, lr: 4.14e-02, grad_scale: 128.0 2024-08-09 14:39:43,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=47500.0, ans=0.0005434782608695655 2024-08-09 14:39:49,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-09 14:40:02,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=15.0 2024-08-09 14:40:09,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=47600.0, ans=0.025 2024-08-09 14:40:33,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=47800.0, ans=0.125 2024-08-09 14:40:45,896 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 14:40:48,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=47900.0, ans=0.2 2024-08-09 14:41:05,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.158e+01 3.572e+01 4.344e+01 1.074e+02, threshold=7.144e+01, percent-clipped=1.0 2024-08-09 14:41:05,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4800, loss[loss=0.1095, beats_loss=0.01619, ecapa_loss=0.0005111, whisper_loss=0.08821, over 15307.00 frames. ], tot_loss[loss=0.127, beats_loss=0.01354, ecapa_loss=0.0005164, whisper_loss=0.1083, over 3896208.03 frames. ], batch size: 64, lr: 4.13e-02, grad_scale: 128.0 2024-08-09 14:41:21,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=15.0 2024-08-09 14:41:35,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=19.00 vs. limit=15.0 2024-08-09 14:41:37,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=22.5 2024-08-09 14:41:52,515 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 14:41:59,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2024-08-09 14:42:08,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=48300.0, ans=15.0 2024-08-09 14:42:31,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4850, loss[loss=0.1158, beats_loss=0.01535, ecapa_loss=0.0004033, whisper_loss=0.0964, over 21353.00 frames. ], tot_loss[loss=0.1278, beats_loss=0.01356, ecapa_loss=0.0005119, whisper_loss=0.1091, over 3928096.17 frames. ], batch size: 84, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:42:32,019 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 14:42:37,870 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 14:42:45,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=48500.0, ans=0.125 2024-08-09 14:43:09,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=48700.0, ans=0.125 2024-08-09 14:43:17,956 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 14:43:25,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=48800.0, ans=0.125 2024-08-09 14:43:31,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=48800.0, ans=0.125 2024-08-09 14:43:51,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=48900.0, ans=0.125 2024-08-09 14:43:53,630 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 14:43:55,553 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-09 14:44:00,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.275e+01 3.682e+01 4.305e+01 7.376e+01, threshold=7.365e+01, percent-clipped=1.0 2024-08-09 14:44:00,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4900, loss[loss=0.08942, beats_loss=0.01722, ecapa_loss=0.0004667, whisper_loss=0.06754, over 20658.00 frames. ], tot_loss[loss=0.1269, beats_loss=0.01364, ecapa_loss=0.0005083, whisper_loss=0.1082, over 3893777.26 frames. ], batch size: 87, lr: 4.12e-02, grad_scale: 128.0 2024-08-09 14:44:21,821 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 14:44:31,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=49100.0, ans=0.2 2024-08-09 14:44:38,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=49200.0, ans=0.125 2024-08-09 14:44:52,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2024-08-09 14:44:53,493 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 14:45:01,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=49300.0, ans=0.0 2024-08-09 14:45:02,534 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 14:45:26,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 4950, loss[loss=0.1102, beats_loss=0.01627, ecapa_loss=0.0004176, whisper_loss=0.08971, over 19186.00 frames. ], tot_loss[loss=0.1264, beats_loss=0.01364, ecapa_loss=0.0005026, whisper_loss=0.1078, over 3888587.74 frames. ], batch size: 74, lr: 4.11e-02, grad_scale: 128.0 2024-08-09 14:45:27,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=49500.0, ans=0.0 2024-08-09 14:45:28,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2024-08-09 14:46:11,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=49700.0, ans=0.2 2024-08-09 14:46:30,283 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-09 14:46:42,730 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 14:46:51,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.48 vs. limit=22.5 2024-08-09 14:46:52,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+01 3.042e+01 3.499e+01 4.372e+01 7.194e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-09 14:46:52,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5000, loss[loss=0.08091, beats_loss=0.01267, ecapa_loss=0.0005151, whisper_loss=0.06308, over 15474.00 frames. ], tot_loss[loss=0.1262, beats_loss=0.01364, ecapa_loss=0.0005032, whisper_loss=0.1075, over 3897664.28 frames. ], batch size: 60, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:46:59,162 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-09 14:47:12,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=50100.0, ans=0.125 2024-08-09 14:47:25,210 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 14:47:30,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.33 vs. limit=15.0 2024-08-09 14:47:36,064 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 14:47:39,312 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 14:47:41,002 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 14:47:48,606 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 14:47:55,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=50400.0, ans=0.0 2024-08-09 14:48:01,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-08-09 14:48:08,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5050, loss[loss=0.1322, beats_loss=0.01435, ecapa_loss=0.0004065, whisper_loss=0.1138, over 15475.00 frames. ], tot_loss[loss=0.1257, beats_loss=0.01368, ecapa_loss=0.0004996, whisper_loss=0.107, over 3865418.74 frames. ], batch size: 56, lr: 4.10e-02, grad_scale: 128.0 2024-08-09 14:48:10,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2024-08-09 14:48:14,547 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-09 14:48:26,784 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 14:48:31,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2024-08-09 14:48:38,153 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 14:48:38,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50700.0, ans=0.1 2024-08-09 14:48:49,974 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 31 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 14:49:15,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 3.052e+01 3.532e+01 4.388e+01 7.103e+01, threshold=7.064e+01, percent-clipped=2.0 2024-08-09 14:49:15,356 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5100, loss[loss=0.1255, beats_loss=0.01498, ecapa_loss=0.0004238, whisper_loss=0.1063, over 23064.00 frames. ], tot_loss[loss=0.1262, beats_loss=0.01362, ecapa_loss=0.0004984, whisper_loss=0.1076, over 3874674.73 frames. ], batch size: 89, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:49:16,804 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 14:49:21,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=51000.0, ans=0.2 2024-08-09 14:49:27,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-09 14:49:31,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=51100.0, ans=0.0 2024-08-09 14:49:31,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=51100.0, ans=0.0 2024-08-09 14:49:34,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-09 14:49:37,431 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 14:49:43,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=51200.0, ans=0.0 2024-08-09 14:49:46,816 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 14:50:06,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=51400.0, ans=0.05 2024-08-09 14:50:19,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-09 14:50:20,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5150, loss[loss=0.1208, beats_loss=0.01525, ecapa_loss=0.0004448, whisper_loss=0.1011, over 21910.00 frames. ], tot_loss[loss=0.1262, beats_loss=0.0136, ecapa_loss=0.0004959, whisper_loss=0.1076, over 3876854.05 frames. ], batch size: 86, lr: 4.09e-02, grad_scale: 128.0 2024-08-09 14:50:33,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51600.0, ans=0.1 2024-08-09 14:50:34,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=51600.0, ans=0.5 2024-08-09 14:50:38,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.19 vs. limit=22.5 2024-08-09 14:50:43,187 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 14:50:54,771 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-09 14:50:59,031 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 14:51:25,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.954e+01 3.465e+01 4.225e+01 6.973e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 14:51:25,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5200, loss[loss=0.1003, beats_loss=0.01066, ecapa_loss=0.0006035, whisper_loss=0.08357, over 14356.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01351, ecapa_loss=0.0004941, whisper_loss=0.107, over 3866537.37 frames. ], batch size: 59, lr: 4.08e-02, grad_scale: 128.0 2024-08-09 14:52:14,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.73 vs. limit=5.0 2024-08-09 14:52:19,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=52400.0, ans=0.125 2024-08-09 14:52:28,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5250, loss[loss=0.1352, beats_loss=0.01306, ecapa_loss=0.0005129, whisper_loss=0.117, over 21984.00 frames. ], tot_loss[loss=0.1255, beats_loss=0.01354, ecapa_loss=0.0004948, whisper_loss=0.107, over 3893543.25 frames. ], batch size: 88, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:52:32,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=52500.0, ans=0.0 2024-08-09 14:52:33,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=52500.0, ans=0.125 2024-08-09 14:52:35,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=52500.0, ans=0.0 2024-08-09 14:52:59,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=52700.0, ans=0.2 2024-08-09 14:53:02,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=52700.0, ans=0.95 2024-08-09 14:53:06,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52800.0, ans=0.1 2024-08-09 14:53:29,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=52900.0, ans=0.125 2024-08-09 14:53:33,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.986e+01 3.430e+01 3.984e+01 5.910e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 14:53:33,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5300, loss[loss=0.1592, beats_loss=0.01122, ecapa_loss=0.0003966, whisper_loss=0.1441, over 20480.00 frames. ], tot_loss[loss=0.1257, beats_loss=0.01346, ecapa_loss=0.0004905, whisper_loss=0.1073, over 3884300.24 frames. ], batch size: 71, lr: 4.07e-02, grad_scale: 128.0 2024-08-09 14:53:38,707 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-09 14:54:01,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=53200.0, ans=0.0 2024-08-09 14:54:04,229 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 14:54:12,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=53300.0, ans=0.125 2024-08-09 14:54:19,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.82 vs. limit=22.5 2024-08-09 14:54:38,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5350, loss[loss=0.1259, beats_loss=0.01251, ecapa_loss=0.0005575, whisper_loss=0.1079, over 20760.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01339, ecapa_loss=0.0004889, whisper_loss=0.1066, over 3858280.56 frames. ], batch size: 86, lr: 4.06e-02, grad_scale: 128.0 2024-08-09 14:54:54,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=53600.0, ans=0.2 2024-08-09 14:54:59,191 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 14:55:03,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=53700.0, ans=0.0 2024-08-09 14:55:22,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=53800.0, ans=0.07 2024-08-09 14:55:30,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53900.0, ans=0.1 2024-08-09 14:55:30,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=53900.0, ans=0.125 2024-08-09 14:55:35,599 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 17 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-09 14:55:37,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2024-08-09 14:55:43,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.073e+01 3.494e+01 4.285e+01 8.308e+01, threshold=6.988e+01, percent-clipped=2.0 2024-08-09 14:55:43,788 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5400, loss[loss=0.1359, beats_loss=0.01331, ecapa_loss=0.0004253, whisper_loss=0.1183, over 22625.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.01345, ecapa_loss=0.0004901, whisper_loss=0.1063, over 3861873.32 frames. ], batch size: 89, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:55:44,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-09 14:55:53,091 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 23 from LS+wenet, 10 from Vox, 20 fro AS 2024-08-09 14:56:02,260 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-09 14:56:03,443 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-09 14:56:14,873 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-09 14:56:17,292 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 14:56:19,786 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-09 14:56:22,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54300.0, ans=0.1 2024-08-09 14:56:26,508 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 14:56:30,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=54300.0, ans=0.125 2024-08-09 14:56:30,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=54300.0, ans=0.0 2024-08-09 14:56:36,824 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 14:56:45,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=12.0 2024-08-09 14:56:47,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5450, loss[loss=0.123, beats_loss=0.01354, ecapa_loss=0.0004629, whisper_loss=0.1048, over 20592.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.01352, ecapa_loss=0.0004864, whisper_loss=0.1063, over 3896512.83 frames. ], batch size: 84, lr: 4.05e-02, grad_scale: 128.0 2024-08-09 14:56:52,289 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 14:57:02,568 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 14:57:11,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=54600.0, ans=0.125 2024-08-09 14:57:14,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-08-09 14:57:15,351 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 14:57:24,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=54700.0, ans=0.125 2024-08-09 14:57:27,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2024-08-09 14:57:35,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54800.0, ans=0.1 2024-08-09 14:57:39,526 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 14:57:51,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 3.087e+01 3.659e+01 4.293e+01 7.884e+01, threshold=7.318e+01, percent-clipped=2.0 2024-08-09 14:57:51,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5500, loss[loss=0.1367, beats_loss=0.01129, ecapa_loss=0.0005348, whisper_loss=0.1201, over 21021.00 frames. ], tot_loss[loss=0.1257, beats_loss=0.01346, ecapa_loss=0.0004855, whisper_loss=0.1074, over 3928622.86 frames. ], batch size: 82, lr: 4.04e-02, grad_scale: 128.0 2024-08-09 14:57:58,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=55000.0, ans=0.2 2024-08-09 14:57:59,957 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 14:58:05,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55100.0, ans=0.1 2024-08-09 14:58:27,806 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 14:58:55,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=55500.0, ans=0.2 2024-08-09 14:58:55,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5550, loss[loss=0.1246, beats_loss=0.01505, ecapa_loss=0.0004555, whisper_loss=0.105, over 21487.00 frames. ], tot_loss[loss=0.1254, beats_loss=0.01345, ecapa_loss=0.0004865, whisper_loss=0.1071, over 3944041.82 frames. ], batch size: 85, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 14:59:03,540 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 14:59:13,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55600.0, ans=0.1 2024-08-09 14:59:15,033 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-09 14:59:28,701 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-09 14:59:48,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=55900.0, ans=0.04949747468305833 2024-08-09 14:59:50,335 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-09 14:59:56,029 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 14:59:59,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.194e+01 3.634e+01 4.385e+01 7.525e+01, threshold=7.268e+01, percent-clipped=1.0 2024-08-09 14:59:59,674 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5600, loss[loss=0.116, beats_loss=0.01465, ecapa_loss=0.0004921, whisper_loss=0.09641, over 21480.00 frames. ], tot_loss[loss=0.1258, beats_loss=0.01341, ecapa_loss=0.0004884, whisper_loss=0.1075, over 3958334.11 frames. ], batch size: 88, lr: 4.03e-02, grad_scale: 128.0 2024-08-09 15:00:14,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=56100.0, ans=0.125 2024-08-09 15:00:16,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=22.5 2024-08-09 15:00:27,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2024-08-09 15:00:33,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=56200.0, ans=0.125 2024-08-09 15:00:37,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=56300.0, ans=0.0 2024-08-09 15:00:37,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2024-08-09 15:00:40,737 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 30 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 15:00:41,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=56300.0, ans=0.125 2024-08-09 15:00:45,778 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 15:00:47,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=56300.0, ans=0.125 2024-08-09 15:00:49,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=56400.0, ans=0.125 2024-08-09 15:00:49,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2024-08-09 15:00:50,837 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-09 15:01:02,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-09 15:01:03,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5650, loss[loss=0.1112, beats_loss=0.01785, ecapa_loss=0.0004355, whisper_loss=0.08899, over 21849.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01352, ecapa_loss=0.000486, whisper_loss=0.1065, over 3939716.09 frames. ], batch size: 92, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:01:07,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2024-08-09 15:01:10,569 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 15:01:13,440 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.347e-02 2024-08-09 15:01:21,974 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 15:01:24,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=56600.0, ans=0.125 2024-08-09 15:01:28,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2024-08-09 15:01:53,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56800.0, ans=0.1 2024-08-09 15:01:55,490 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 15:02:00,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=56900.0, ans=0.125 2024-08-09 15:02:04,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56900.0, ans=0.1 2024-08-09 15:02:08,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.137e+01 3.741e+01 4.572e+01 6.525e+01, threshold=7.481e+01, percent-clipped=0.0 2024-08-09 15:02:08,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5700, loss[loss=0.1329, beats_loss=0.01428, ecapa_loss=0.0004821, whisper_loss=0.1138, over 21279.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.0136, ecapa_loss=0.0004838, whisper_loss=0.1059, over 3957977.97 frames. ], batch size: 84, lr: 4.02e-02, grad_scale: 128.0 2024-08-09 15:02:08,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=57000.0, ans=0.125 2024-08-09 15:02:15,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=57000.0, ans=0.0 2024-08-09 15:02:20,333 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 15:02:52,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=57300.0, ans=0.125 2024-08-09 15:02:53,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=57300.0, ans=0.2 2024-08-09 15:02:56,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-09 15:03:03,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.69 vs. limit=15.0 2024-08-09 15:03:08,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=57400.0, ans=0.0 2024-08-09 15:03:13,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5750, loss[loss=0.1211, beats_loss=0.01595, ecapa_loss=0.0004497, whisper_loss=0.1006, over 21427.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01353, ecapa_loss=0.0004868, whisper_loss=0.1064, over 3929597.69 frames. ], batch size: 91, lr: 4.01e-02, grad_scale: 128.0 2024-08-09 15:03:13,498 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-09 15:03:22,526 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 15:03:23,807 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 15:03:43,794 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-09 15:03:44,901 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-09 15:03:46,274 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-09 15:03:54,308 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 15:04:17,794 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-09 15:04:18,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.931e+01 3.260e+01 3.924e+01 8.527e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-09 15:04:18,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5800, loss[loss=0.1119, beats_loss=0.011, ecapa_loss=0.0005037, whisper_loss=0.09583, over 14626.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.01352, ecapa_loss=0.0004865, whisper_loss=0.1063, over 3914755.85 frames. ], batch size: 59, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:04:21,748 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-09 15:04:41,752 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-09 15:05:02,053 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-09 15:05:04,561 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-09 15:05:08,453 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-09 15:05:11,121 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 15:05:24,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=58500.0, ans=0.125 2024-08-09 15:05:25,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5850, loss[loss=0.1223, beats_loss=0.01452, ecapa_loss=0.0004648, whisper_loss=0.1031, over 22195.00 frames. ], tot_loss[loss=0.1246, beats_loss=0.01353, ecapa_loss=0.0004871, whisper_loss=0.1062, over 3925679.10 frames. ], batch size: 89, lr: 4.00e-02, grad_scale: 128.0 2024-08-09 15:05:36,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2024-08-09 15:05:50,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=58600.0, ans=0.0 2024-08-09 15:05:54,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-09 15:05:56,954 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-09 15:06:09,084 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 15:06:10,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=58800.0, ans=0.04949747468305833 2024-08-09 15:06:10,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=58800.0, ans=0.125 2024-08-09 15:06:22,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=58900.0, ans=0.125 2024-08-09 15:06:23,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=58900.0, ans=0.125 2024-08-09 15:06:25,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.15 vs. limit=22.5 2024-08-09 15:06:34,142 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 3.149e+01 3.698e+01 4.735e+01 7.316e+01, threshold=7.396e+01, percent-clipped=3.0 2024-08-09 15:06:34,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5900, loss[loss=0.0992, beats_loss=0.01601, ecapa_loss=0.0004342, whisper_loss=0.07884, over 17603.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.01352, ecapa_loss=0.000487, whisper_loss=0.1063, over 3924452.68 frames. ], batch size: 70, lr: 3.99e-02, grad_scale: 128.0 2024-08-09 15:06:39,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=59000.0, ans=0.2 2024-08-09 15:06:48,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59100.0, ans=0.1 2024-08-09 15:06:52,933 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 15:07:06,041 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 15:07:09,001 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 15:07:26,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=59400.0, ans=15.0 2024-08-09 15:07:36,803 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 15:07:37,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=59400.0, ans=0.0 2024-08-09 15:07:38,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=59400.0, ans=0.125 2024-08-09 15:07:40,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 5950, loss[loss=0.1286, beats_loss=0.01056, ecapa_loss=0.0004952, whisper_loss=0.113, over 23133.00 frames. ], tot_loss[loss=0.1251, beats_loss=0.01342, ecapa_loss=0.0004832, whisper_loss=0.1068, over 3910267.33 frames. ], batch size: 91, lr: 3.98e-02, grad_scale: 128.0 2024-08-09 15:07:55,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=59600.0, ans=0.125 2024-08-09 15:08:03,719 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 14 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 15:08:06,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.50 vs. limit=15.0 2024-08-09 15:08:09,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-09 15:08:40,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=59900.0, ans=0.0 2024-08-09 15:08:52,068 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.855e+01 3.241e+01 4.234e+01 7.891e+01, threshold=6.482e+01, percent-clipped=2.0 2024-08-09 15:08:52,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6000, loss[loss=0.124, beats_loss=0.01478, ecapa_loss=0.000435, whisper_loss=0.1049, over 20244.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.01341, ecapa_loss=0.0004782, whisper_loss=0.1065, over 3910842.72 frames. ], batch size: 82, lr: 3.98e-02, grad_scale: 256.0 2024-08-09 15:08:52,091 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 15:09:28,560 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on ASR_libri: loss=0.2951, beats_loss=0, ecapa_loss=0.001297, whisper_loss=0.2822, over 922467.00 frames. 2024-08-09 15:09:46,194 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on SV_voxceleb1: loss=0.01236, beats_loss=0, ecapa_loss=0.001236, whisper_loss=0, over 939242.00 frames. 2024-08-09 15:11:29,819 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on AT_audioset: loss=0.03246, beats_loss=0.03246, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 15:11:29,824 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 15:11:30,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=60000.0, ans=0.0 2024-08-09 15:11:45,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=60100.0, ans=0.0 2024-08-09 15:11:56,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=60100.0, ans=0.1 2024-08-09 15:12:07,216 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 15:12:11,484 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 14 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 15:12:18,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-08-09 15:12:26,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=60300.0, ans=0.2 2024-08-09 15:12:29,799 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 15:12:45,007 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6050, loss[loss=0.1068, beats_loss=0.01589, ecapa_loss=0.000506, whisper_loss=0.08585, over 20292.00 frames. ], tot_loss[loss=0.1242, beats_loss=0.01343, ecapa_loss=0.0004743, whisper_loss=0.106, over 3876998.00 frames. ], batch size: 85, lr: 3.97e-02, grad_scale: 256.0 2024-08-09 15:12:49,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=60500.0, ans=0.2 2024-08-09 15:13:02,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=60600.0, ans=0.125 2024-08-09 15:13:05,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=60600.0, ans=0.125 2024-08-09 15:13:16,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-08-09 15:13:18,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=60700.0, ans=0.125 2024-08-09 15:13:25,266 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 15:13:28,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=60800.0, ans=0.125 2024-08-09 15:13:28,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.78 vs. limit=22.5 2024-08-09 15:13:36,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2024-08-09 15:13:40,730 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-09 15:13:40,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=60800.0, ans=0.0 2024-08-09 15:13:52,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=60900.0, ans=0.125 2024-08-09 15:13:53,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=60900.0, ans=0.125 2024-08-09 15:13:56,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=60900.0, ans=0.0 2024-08-09 15:13:57,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-09 15:13:59,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 3.011e+01 3.542e+01 4.337e+01 6.873e+01, threshold=7.084e+01, percent-clipped=1.0 2024-08-09 15:13:59,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6100, loss[loss=0.1407, beats_loss=0.01315, ecapa_loss=0.0003765, whisper_loss=0.1238, over 19398.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01346, ecapa_loss=0.0004744, whisper_loss=0.1057, over 3876533.43 frames. ], batch size: 73, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:14:20,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=61100.0, ans=0.125 2024-08-09 15:14:25,929 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 15:14:32,245 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 15:14:50,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=61300.0, ans=0.125 2024-08-09 15:15:01,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=61400.0, ans=0.125 2024-08-09 15:15:06,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=61400.0, ans=0.0 2024-08-09 15:15:07,575 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-09 15:15:13,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6150, loss[loss=0.1457, beats_loss=0.01051, ecapa_loss=0.0005086, whisper_loss=0.1301, over 23414.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01357, ecapa_loss=0.000474, whisper_loss=0.1052, over 3884099.13 frames. ], batch size: 93, lr: 3.96e-02, grad_scale: 256.0 2024-08-09 15:15:29,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=61600.0, ans=0.0 2024-08-09 15:15:30,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.94 vs. limit=15.0 2024-08-09 15:15:32,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=61600.0, ans=0.2 2024-08-09 15:15:36,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=61600.0, ans=0.09899494936611666 2024-08-09 15:15:39,571 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 15:15:40,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-09 15:15:44,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=61700.0, ans=0.125 2024-08-09 15:15:48,392 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-09 15:16:26,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=62000.0, ans=0.0 2024-08-09 15:16:27,960 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.116e+01 3.579e+01 4.385e+01 6.920e+01, threshold=7.157e+01, percent-clipped=0.0 2024-08-09 15:16:27,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6200, loss[loss=0.09549, beats_loss=0.014, ecapa_loss=0.0004634, whisper_loss=0.07685, over 14950.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01358, ecapa_loss=0.0004751, whisper_loss=0.1044, over 3863058.78 frames. ], batch size: 60, lr: 3.95e-02, grad_scale: 256.0 2024-08-09 15:16:28,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=62000.0, ans=0.125 2024-08-09 15:16:35,243 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 15:16:43,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2024-08-09 15:17:04,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=62200.0, ans=0.125 2024-08-09 15:17:12,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2024-08-09 15:17:28,160 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-09 15:17:43,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6250, loss[loss=0.1424, beats_loss=0.008477, ecapa_loss=0.0005077, whisper_loss=0.1288, over 18803.00 frames. ], tot_loss[loss=0.1234, beats_loss=0.01346, ecapa_loss=0.0004705, whisper_loss=0.1052, over 3895392.99 frames. ], batch size: 71, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:18:01,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=15.0 2024-08-09 15:18:18,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=62700.0, ans=0.125 2024-08-09 15:18:20,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=62700.0, ans=0.2 2024-08-09 15:18:27,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=62800.0, ans=0.2 2024-08-09 15:19:00,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.251e+01 2.965e+01 3.406e+01 4.255e+01 1.028e+02, threshold=6.812e+01, percent-clipped=2.0 2024-08-09 15:19:00,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6300, loss[loss=0.09368, beats_loss=0.0147, ecapa_loss=0.0004949, whisper_loss=0.07403, over 20517.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01344, ecapa_loss=0.0004686, whisper_loss=0.1056, over 3892335.17 frames. ], batch size: 86, lr: 3.94e-02, grad_scale: 256.0 2024-08-09 15:19:00,308 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 15:19:07,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=63000.0, ans=0.125 2024-08-09 15:19:24,409 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 15:19:32,620 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 15:19:56,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=12.0 2024-08-09 15:20:00,713 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 15:20:18,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6350, loss[loss=0.1241, beats_loss=0.0125, ecapa_loss=0.0005124, whisper_loss=0.1065, over 18558.00 frames. ], tot_loss[loss=0.1229, beats_loss=0.01344, ecapa_loss=0.0004696, whisper_loss=0.1048, over 3867598.36 frames. ], batch size: 76, lr: 3.93e-02, grad_scale: 256.0 2024-08-09 15:20:20,331 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 15:20:34,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-09 15:20:42,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.19 vs. limit=6.0 2024-08-09 15:20:53,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63700.0, ans=0.1 2024-08-09 15:20:54,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=63700.0, ans=0.0 2024-08-09 15:21:30,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=63900.0, ans=0.125 2024-08-09 15:21:32,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=63900.0, ans=15.0 2024-08-09 15:21:38,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.075e+01 3.568e+01 4.201e+01 6.933e+01, threshold=7.136e+01, percent-clipped=1.0 2024-08-09 15:21:38,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6400, loss[loss=0.1467, beats_loss=0.009245, ecapa_loss=0.0005585, whisper_loss=0.1319, over 18210.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01339, ecapa_loss=0.0004697, whisper_loss=0.1053, over 3863527.18 frames. ], batch size: 73, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:21:42,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=64000.0, ans=0.2 2024-08-09 15:21:50,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=64000.0, ans=0.125 2024-08-09 15:21:52,269 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 15:21:54,867 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-09 15:22:05,704 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 15:22:14,942 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 15:22:20,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-09 15:22:37,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=64300.0, ans=0.0 2024-08-09 15:22:57,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6450, loss[loss=0.1078, beats_loss=0.01541, ecapa_loss=0.0003735, whisper_loss=0.08868, over 14133.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01345, ecapa_loss=0.000465, whisper_loss=0.1059, over 3869689.42 frames. ], batch size: 54, lr: 3.92e-02, grad_scale: 256.0 2024-08-09 15:23:04,439 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 15:23:11,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-09 15:23:20,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=64600.0, ans=0.125 2024-08-09 15:23:26,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=64600.0, ans=0.125 2024-08-09 15:23:29,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=64700.0, ans=0.0 2024-08-09 15:23:46,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.57 vs. limit=22.5 2024-08-09 15:23:58,853 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-09 15:24:01,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=64900.0, ans=0.07 2024-08-09 15:24:01,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=64900.0, ans=0.0 2024-08-09 15:24:17,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.103e+01 3.527e+01 4.351e+01 8.335e+01, threshold=7.053e+01, percent-clipped=1.0 2024-08-09 15:24:17,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6500, loss[loss=0.1373, beats_loss=0.01487, ecapa_loss=0.0003366, whisper_loss=0.119, over 17550.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.0134, ecapa_loss=0.0004652, whisper_loss=0.1061, over 3878162.85 frames. ], batch size: 66, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:24:30,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=65000.0, ans=0.125 2024-08-09 15:24:37,398 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-09 15:24:43,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=65100.0, ans=0.125 2024-08-09 15:24:54,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2024-08-09 15:25:12,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=65300.0, ans=0.125 2024-08-09 15:25:18,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=65300.0, ans=0.0 2024-08-09 15:25:37,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6550, loss[loss=0.1241, beats_loss=0.01512, ecapa_loss=0.0004286, whisper_loss=0.1046, over 22168.00 frames. ], tot_loss[loss=0.1248, beats_loss=0.01331, ecapa_loss=0.0004644, whisper_loss=0.1069, over 3901088.61 frames. ], batch size: 92, lr: 3.91e-02, grad_scale: 256.0 2024-08-09 15:25:43,507 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-09 15:26:00,115 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-09 15:26:09,162 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 15:26:18,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65700.0, ans=0.125 2024-08-09 15:26:51,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=65900.0, ans=0.04949747468305833 2024-08-09 15:26:57,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 3.063e+01 3.628e+01 4.391e+01 7.750e+01, threshold=7.256e+01, percent-clipped=3.0 2024-08-09 15:26:57,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6600, loss[loss=0.136, beats_loss=0.009789, ecapa_loss=0.0005176, whisper_loss=0.121, over 23655.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01328, ecapa_loss=0.000464, whisper_loss=0.1077, over 3916834.06 frames. ], batch size: 90, lr: 3.90e-02, grad_scale: 256.0 2024-08-09 15:27:09,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66000.0, ans=0.1 2024-08-09 15:27:16,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=66100.0, ans=0.5 2024-08-09 15:27:26,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=66200.0, ans=0.125 2024-08-09 15:27:31,417 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 15:27:37,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=66200.0, ans=0.07 2024-08-09 15:27:46,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=66300.0, ans=0.125 2024-08-09 15:27:54,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=66300.0, ans=0.04949747468305833 2024-08-09 15:27:55,455 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 15:27:58,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=66400.0, ans=0.1 2024-08-09 15:28:13,371 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 15:28:14,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6650, loss[loss=0.1215, beats_loss=0.01476, ecapa_loss=0.0003833, whisper_loss=0.1029, over 17711.00 frames. ], tot_loss[loss=0.125, beats_loss=0.01329, ecapa_loss=0.0004609, whisper_loss=0.1071, over 3875239.95 frames. ], batch size: 66, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:28:15,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2024-08-09 15:28:20,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=66500.0, ans=0.2 2024-08-09 15:28:34,082 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 15:28:41,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.21 vs. limit=10.0 2024-08-09 15:28:46,076 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.178e+00 2024-08-09 15:28:57,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=66700.0, ans=0.125 2024-08-09 15:29:02,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=66800.0, ans=0.125 2024-08-09 15:29:08,453 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 15:29:11,071 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-09 15:29:14,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-09 15:29:21,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=66900.0, ans=0.125 2024-08-09 15:29:29,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=67000.0, ans=0.125 2024-08-09 15:29:31,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.012e+01 3.433e+01 4.224e+01 7.038e+01, threshold=6.866e+01, percent-clipped=0.0 2024-08-09 15:29:31,317 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6700, loss[loss=0.1137, beats_loss=0.01339, ecapa_loss=0.0004388, whisper_loss=0.09593, over 15041.00 frames. ], tot_loss[loss=0.1251, beats_loss=0.0133, ecapa_loss=0.0004594, whisper_loss=0.1072, over 3853424.78 frames. ], batch size: 60, lr: 3.89e-02, grad_scale: 256.0 2024-08-09 15:30:10,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=67200.0, ans=0.0 2024-08-09 15:30:13,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=67200.0, ans=0.2 2024-08-09 15:30:23,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.0 2024-08-09 15:30:30,042 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 15:30:47,543 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6750, loss[loss=0.1359, beats_loss=0.01568, ecapa_loss=0.0005255, whisper_loss=0.115, over 19828.00 frames. ], tot_loss[loss=0.1256, beats_loss=0.01331, ecapa_loss=0.0004592, whisper_loss=0.1077, over 3862291.44 frames. ], batch size: 83, lr: 3.88e-02, grad_scale: 256.0 2024-08-09 15:32:03,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.094e+01 3.540e+01 4.120e+01 7.157e+01, threshold=7.079e+01, percent-clipped=1.0 2024-08-09 15:32:03,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6800, loss[loss=0.0992, beats_loss=0.01722, ecapa_loss=0.0003408, whisper_loss=0.07857, over 23212.00 frames. ], tot_loss[loss=0.1251, beats_loss=0.01338, ecapa_loss=0.0004588, whisper_loss=0.1071, over 3873502.17 frames. ], batch size: 93, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:32:20,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=68100.0, ans=0.2 2024-08-09 15:32:48,171 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=7.767e-03 2024-08-09 15:32:49,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=68300.0, ans=0.0 2024-08-09 15:32:59,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=68300.0, ans=0.0 2024-08-09 15:33:09,456 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-09 15:33:14,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-08-09 15:33:15,623 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-09 15:33:17,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6850, loss[loss=0.1239, beats_loss=0.01504, ecapa_loss=0.0004103, whisper_loss=0.1047, over 23521.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.0133, ecapa_loss=0.0004612, whisper_loss=0.1068, over 3904999.78 frames. ], batch size: 92, lr: 3.87e-02, grad_scale: 256.0 2024-08-09 15:33:29,237 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.420e-02 2024-08-09 15:33:38,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=68600.0, ans=0.0 2024-08-09 15:33:39,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=68600.0, ans=0.125 2024-08-09 15:33:54,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2024-08-09 15:34:00,690 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 15:34:21,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.14 vs. limit=15.0 2024-08-09 15:34:27,658 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-09 15:34:33,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 3.018e+01 3.583e+01 4.075e+01 7.184e+01, threshold=7.167e+01, percent-clipped=2.0 2024-08-09 15:34:33,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6900, loss[loss=0.1279, beats_loss=0.01114, ecapa_loss=0.0005272, whisper_loss=0.1115, over 15802.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.01333, ecapa_loss=0.0004607, whisper_loss=0.1064, over 3894702.77 frames. ], batch size: 64, lr: 3.86e-02, grad_scale: 256.0 2024-08-09 15:34:56,215 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 15:35:20,923 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 15:35:24,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69300.0, ans=0.125 2024-08-09 15:35:49,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 6950, loss[loss=0.1237, beats_loss=0.01582, ecapa_loss=0.0004061, whisper_loss=0.1038, over 19240.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01329, ecapa_loss=0.0004581, whisper_loss=0.107, over 3916964.91 frames. ], batch size: 74, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:36:16,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=69600.0, ans=0.125 2024-08-09 15:36:26,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=69700.0, ans=0.125 2024-08-09 15:36:30,501 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 15:36:41,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=69800.0, ans=0.125 2024-08-09 15:36:43,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=69800.0, ans=0.125 2024-08-09 15:36:44,470 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-09 15:36:46,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=69800.0, ans=0.125 2024-08-09 15:37:07,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.090e+01 3.523e+01 4.430e+01 8.295e+01, threshold=7.046e+01, percent-clipped=3.0 2024-08-09 15:37:07,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7000, loss[loss=0.1273, beats_loss=0.01338, ecapa_loss=0.0005065, whisper_loss=0.1089, over 17871.00 frames. ], tot_loss[loss=0.1247, beats_loss=0.0133, ecapa_loss=0.000456, whisper_loss=0.1068, over 3879954.32 frames. ], batch size: 71, lr: 3.85e-02, grad_scale: 256.0 2024-08-09 15:37:11,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=70000.0, ans=0.035 2024-08-09 15:37:14,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=12.0 2024-08-09 15:37:32,825 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-09 15:37:49,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=70200.0, ans=0.125 2024-08-09 15:37:52,528 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 15:37:54,205 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 15:38:02,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=70300.0, ans=0.2 2024-08-09 15:38:08,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2024-08-09 15:38:29,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7050, loss[loss=0.1347, beats_loss=0.01152, ecapa_loss=0.000425, whisper_loss=0.119, over 20100.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.01338, ecapa_loss=0.0004542, whisper_loss=0.1062, over 3871879.83 frames. ], batch size: 79, lr: 3.84e-02, grad_scale: 256.0 2024-08-09 15:38:42,201 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 15:39:00,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=15.0 2024-08-09 15:39:01,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70700.0, ans=0.1 2024-08-09 15:39:16,147 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 11 from Vox, 47 fro AS 2024-08-09 15:39:21,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=70800.0, ans=0.2 2024-08-09 15:39:34,200 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-09 15:39:43,672 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.931e+01 3.439e+01 4.149e+01 6.385e+01, threshold=6.878e+01, percent-clipped=0.0 2024-08-09 15:39:43,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7100, loss[loss=0.1268, beats_loss=0.0107, ecapa_loss=0.0004464, whisper_loss=0.1116, over 18939.00 frames. ], tot_loss[loss=0.1241, beats_loss=0.0134, ecapa_loss=0.0004485, whisper_loss=0.1062, over 3847650.37 frames. ], batch size: 75, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:39:59,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=71100.0, ans=0.125 2024-08-09 15:40:02,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=71100.0, ans=0.125 2024-08-09 15:40:15,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=71200.0, ans=0.125 2024-08-09 15:40:56,818 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 15:41:00,829 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7150, loss[loss=0.148, beats_loss=0.01015, ecapa_loss=0.0005459, whisper_loss=0.1323, over 21401.00 frames. ], tot_loss[loss=0.1239, beats_loss=0.01338, ecapa_loss=0.0004445, whisper_loss=0.1061, over 3856366.77 frames. ], batch size: 86, lr: 3.83e-02, grad_scale: 256.0 2024-08-09 15:41:01,008 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 15:41:01,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-08-09 15:41:17,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=71600.0, ans=0.125 2024-08-09 15:41:24,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=71600.0, ans=0.125 2024-08-09 15:41:40,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=71700.0, ans=0.125 2024-08-09 15:42:02,187 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 15:42:02,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=71900.0, ans=0.2 2024-08-09 15:42:07,527 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 15:42:09,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.942e-02 2024-08-09 15:42:16,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-09 15:42:21,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 3.087e+01 3.536e+01 4.239e+01 7.384e+01, threshold=7.073e+01, percent-clipped=1.0 2024-08-09 15:42:21,671 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7200, loss[loss=0.1063, beats_loss=0.01498, ecapa_loss=0.0004468, whisper_loss=0.08681, over 21666.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01332, ecapa_loss=0.000447, whisper_loss=0.1063, over 3865961.19 frames. ], batch size: 90, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:42:24,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=15.0 2024-08-09 15:42:38,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=72100.0, ans=0.025 2024-08-09 15:42:40,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=72100.0, ans=0.04949747468305833 2024-08-09 15:43:04,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=72200.0, ans=0.0 2024-08-09 15:43:07,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=72200.0, ans=0.125 2024-08-09 15:43:19,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=72300.0, ans=0.0 2024-08-09 15:43:21,595 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 15:43:42,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=15.0 2024-08-09 15:43:46,365 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7250, loss[loss=0.1217, beats_loss=0.01563, ecapa_loss=0.0003498, whisper_loss=0.1026, over 20591.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.01326, ecapa_loss=0.0004463, whisper_loss=0.1066, over 3916630.58 frames. ], batch size: 78, lr: 3.82e-02, grad_scale: 256.0 2024-08-09 15:43:53,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=72500.0, ans=0.125 2024-08-09 15:43:55,388 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 15:43:59,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=72500.0, ans=0.2 2024-08-09 15:44:17,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=72600.0, ans=0.0 2024-08-09 15:44:30,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=72700.0, ans=0.0 2024-08-09 15:44:35,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=72700.0, ans=0.0 2024-08-09 15:44:50,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=15.0 2024-08-09 15:44:58,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=72800.0, ans=15.0 2024-08-09 15:45:01,259 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 14 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 15:45:07,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=72900.0, ans=0.0 2024-08-09 15:45:20,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.182e+01 3.061e+01 3.709e+01 4.320e+01 7.317e+01, threshold=7.418e+01, percent-clipped=1.0 2024-08-09 15:45:20,068 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7300, loss[loss=0.1083, beats_loss=0.01578, ecapa_loss=0.0004481, whisper_loss=0.08803, over 21910.00 frames. ], tot_loss[loss=0.1243, beats_loss=0.01327, ecapa_loss=0.000447, whisper_loss=0.1065, over 3914758.59 frames. ], batch size: 91, lr: 3.81e-02, grad_scale: 256.0 2024-08-09 15:45:20,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=73000.0, ans=0.0 2024-08-09 15:45:22,503 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 15:45:37,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=73100.0, ans=0.125 2024-08-09 15:46:29,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=73300.0, ans=0.0 2024-08-09 15:46:36,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2024-08-09 15:46:42,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-08-09 15:46:52,752 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7350, loss[loss=0.1016, beats_loss=0.01727, ecapa_loss=0.0003393, whisper_loss=0.08091, over 22058.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01337, ecapa_loss=0.0004455, whisper_loss=0.1061, over 3904620.01 frames. ], batch size: 90, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:47:12,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=73600.0, ans=0.2 2024-08-09 15:47:14,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=73600.0, ans=0.125 2024-08-09 15:47:14,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=73600.0, ans=0.2 2024-08-09 15:47:35,787 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 15:47:39,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=73700.0, ans=0.0 2024-08-09 15:47:41,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=73800.0, ans=0.125 2024-08-09 15:47:47,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=73800.0, ans=0.1 2024-08-09 15:47:54,237 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.283e+00 2024-08-09 15:48:01,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=73900.0, ans=0.0 2024-08-09 15:48:14,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.73 vs. limit=22.5 2024-08-09 15:48:21,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.843e+01 3.393e+01 4.039e+01 7.371e+01, threshold=6.786e+01, percent-clipped=0.0 2024-08-09 15:48:21,326 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7400, loss[loss=0.1271, beats_loss=0.01207, ecapa_loss=0.0004559, whisper_loss=0.1105, over 13954.00 frames. ], tot_loss[loss=0.1245, beats_loss=0.01334, ecapa_loss=0.0004498, whisper_loss=0.1067, over 3914439.13 frames. ], batch size: 55, lr: 3.80e-02, grad_scale: 256.0 2024-08-09 15:48:27,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=74000.0, ans=15.0 2024-08-09 15:48:30,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=74000.0, ans=0.0 2024-08-09 15:48:33,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=74000.0, ans=0.2 2024-08-09 15:48:42,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=74100.0, ans=0.0 2024-08-09 15:48:43,608 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 15:48:46,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=74100.0, ans=0.0 2024-08-09 15:48:54,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-09 15:49:07,710 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-09 15:49:56,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7450, loss[loss=0.1102, beats_loss=0.01494, ecapa_loss=0.0004322, whisper_loss=0.09089, over 16815.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.01329, ecapa_loss=0.0004476, whisper_loss=0.1067, over 3894813.92 frames. ], batch size: 71, lr: 3.79e-02, grad_scale: 256.0 2024-08-09 15:50:02,342 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 15:50:17,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=74600.0, ans=0.1 2024-08-09 15:50:18,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=74600.0, ans=0.125 2024-08-09 15:50:19,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=74600.0, ans=0.2 2024-08-09 15:50:43,648 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-09 15:50:44,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.85 vs. limit=15.0 2024-08-09 15:50:53,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=74800.0, ans=0.07 2024-08-09 15:51:01,602 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:51:09,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=74900.0, ans=0.125 2024-08-09 15:51:13,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 3.130e+01 3.399e+01 4.155e+01 7.076e+01, threshold=6.798e+01, percent-clipped=1.0 2024-08-09 15:51:13,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7500, loss[loss=0.1164, beats_loss=0.01294, ecapa_loss=0.00043, whisper_loss=0.09912, over 20636.00 frames. ], tot_loss[loss=0.1244, beats_loss=0.0133, ecapa_loss=0.0004481, whisper_loss=0.1066, over 3878378.75 frames. ], batch size: 81, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:51:26,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=75000.0, ans=0.125 2024-08-09 15:51:31,830 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-09 15:51:34,668 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-09 15:51:37,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75100.0, ans=0.1 2024-08-09 15:51:58,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=75300.0, ans=0.125 2024-08-09 15:52:02,865 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 15:52:07,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=75300.0, ans=0.125 2024-08-09 15:52:08,244 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 15:52:12,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75400.0, ans=0.1 2024-08-09 15:52:21,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2024-08-09 15:52:24,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7550, loss[loss=0.09679, beats_loss=0.01513, ecapa_loss=0.000641, whisper_loss=0.07526, over 19578.00 frames. ], tot_loss[loss=0.1236, beats_loss=0.01338, ecapa_loss=0.0004477, whisper_loss=0.1057, over 3846819.86 frames. ], batch size: 91, lr: 3.78e-02, grad_scale: 256.0 2024-08-09 15:52:25,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=75500.0, ans=0.2 2024-08-09 15:52:44,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=75600.0, ans=0.125 2024-08-09 15:52:44,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=75600.0, ans=0.2 2024-08-09 15:52:57,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2024-08-09 15:53:06,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=75800.0, ans=0.2 2024-08-09 15:53:31,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=75900.0, ans=0.0 2024-08-09 15:53:32,896 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 15:53:35,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 3.036e+01 3.542e+01 4.226e+01 5.898e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 15:53:35,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7600, loss[loss=0.1384, beats_loss=0.01119, ecapa_loss=0.0004736, whisper_loss=0.1224, over 21824.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01334, ecapa_loss=0.0004445, whisper_loss=0.1059, over 3840030.14 frames. ], batch size: 88, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:53:52,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=76100.0, ans=0.125 2024-08-09 15:54:00,918 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 15:54:19,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2024-08-09 15:54:22,119 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.773e-02 2024-08-09 15:54:36,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.45 vs. limit=22.5 2024-08-09 15:54:42,306 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.193e-02 2024-08-09 15:54:44,989 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 15:54:46,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7650, loss[loss=0.09961, beats_loss=0.01499, ecapa_loss=0.0004723, whisper_loss=0.0799, over 13286.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01327, ecapa_loss=0.0004464, whisper_loss=0.106, over 3829053.07 frames. ], batch size: 56, lr: 3.77e-02, grad_scale: 256.0 2024-08-09 15:54:52,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-09 15:55:14,056 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-09 15:55:25,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=76700.0, ans=0.0 2024-08-09 15:55:28,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=76800.0, ans=0.2 2024-08-09 15:55:31,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=76800.0, ans=0.2 2024-08-09 15:55:43,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=76900.0, ans=0.2 2024-08-09 15:55:44,516 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-09 15:55:49,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=76900.0, ans=0.125 2024-08-09 15:55:53,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=76900.0, ans=15.0 2024-08-09 15:55:55,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.321e+01 3.065e+01 3.556e+01 4.140e+01 7.466e+01, threshold=7.113e+01, percent-clipped=1.0 2024-08-09 15:55:55,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7700, loss[loss=0.1437, beats_loss=0.01129, ecapa_loss=0.0004444, whisper_loss=0.128, over 22609.00 frames. ], tot_loss[loss=0.1233, beats_loss=0.01336, ecapa_loss=0.0004451, whisper_loss=0.1055, over 3856174.55 frames. ], batch size: 90, lr: 3.76e-02, grad_scale: 256.0 2024-08-09 15:55:55,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=77000.0, ans=0.125 2024-08-09 15:55:57,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=77000.0, ans=0.125 2024-08-09 15:55:58,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=77000.0, ans=0.0 2024-08-09 15:56:08,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=77100.0, ans=0.2 2024-08-09 15:56:21,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=77100.0, ans=0.2 2024-08-09 15:56:25,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=12.0 2024-08-09 15:56:34,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=12.24 vs. limit=10.0 2024-08-09 15:56:44,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=77300.0, ans=0.125 2024-08-09 15:56:48,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=77300.0, ans=0.2 2024-08-09 15:56:51,529 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 15:56:53,487 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 15:56:55,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=77400.0, ans=0.2 2024-08-09 15:56:57,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77400.0, ans=0.1 2024-08-09 15:57:06,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=77500.0, ans=0.125 2024-08-09 15:57:07,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7750, loss[loss=0.1084, beats_loss=0.01387, ecapa_loss=0.0004209, whisper_loss=0.09031, over 17600.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01337, ecapa_loss=0.0004427, whisper_loss=0.1049, over 3859331.00 frames. ], batch size: 71, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:57:09,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=77500.0, ans=0.0 2024-08-09 15:57:19,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=77500.0, ans=0.2 2024-08-09 15:57:19,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=77500.0, ans=0.2 2024-08-09 15:57:24,636 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-09 15:57:56,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77800.0, ans=0.1 2024-08-09 15:58:17,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.915e+01 3.303e+01 4.126e+01 7.711e+01, threshold=6.607e+01, percent-clipped=1.0 2024-08-09 15:58:17,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7800, loss[loss=0.1284, beats_loss=0.01329, ecapa_loss=0.0004632, whisper_loss=0.1104, over 13561.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01332, ecapa_loss=0.0004416, whisper_loss=0.1057, over 3857198.70 frames. ], batch size: 56, lr: 3.75e-02, grad_scale: 256.0 2024-08-09 15:58:19,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=78000.0, ans=0.025 2024-08-09 15:58:21,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=78000.0, ans=0.2 2024-08-09 15:58:29,542 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 15:58:36,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=78100.0, ans=0.125 2024-08-09 15:59:04,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.17 vs. limit=10.0 2024-08-09 15:59:08,426 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 15:59:09,936 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 15:59:26,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7850, loss[loss=0.1288, beats_loss=0.0129, ecapa_loss=0.0003861, whisper_loss=0.1121, over 22393.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01342, ecapa_loss=0.0004404, whisper_loss=0.1047, over 3861899.29 frames. ], batch size: 88, lr: 3.74e-02, grad_scale: 256.0 2024-08-09 15:59:33,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=78500.0, ans=0.2 2024-08-09 15:59:36,624 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-09 15:59:39,370 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-09 15:59:53,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2024-08-09 15:59:55,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.20 vs. limit=22.5 2024-08-09 16:00:05,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=78700.0, ans=10.0 2024-08-09 16:00:09,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=78800.0, ans=0.125 2024-08-09 16:00:11,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=78800.0, ans=0.05 2024-08-09 16:00:25,334 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 16:00:31,220 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-09 16:00:34,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=79000.0, ans=10.0 2024-08-09 16:00:35,140 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 3.036e+01 3.521e+01 4.450e+01 7.582e+01, threshold=7.043e+01, percent-clipped=4.0 2024-08-09 16:00:35,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7900, loss[loss=0.1268, beats_loss=0.01485, ecapa_loss=0.0004725, whisper_loss=0.1072, over 20453.00 frames. ], tot_loss[loss=0.1237, beats_loss=0.01329, ecapa_loss=0.0004428, whisper_loss=0.106, over 3871170.14 frames. ], batch size: 84, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:00:45,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=79000.0, ans=0.0 2024-08-09 16:01:08,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=79200.0, ans=0.125 2024-08-09 16:01:10,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=79200.0, ans=0.125 2024-08-09 16:01:16,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=79300.0, ans=0.0 2024-08-09 16:01:39,771 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 16:01:42,615 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 16:01:43,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 7950, loss[loss=0.1296, beats_loss=0.0116, ecapa_loss=0.0004497, whisper_loss=0.1135, over 19661.00 frames. ], tot_loss[loss=0.1249, beats_loss=0.01319, ecapa_loss=0.0004432, whisper_loss=0.1073, over 3886497.35 frames. ], batch size: 79, lr: 3.73e-02, grad_scale: 256.0 2024-08-09 16:01:49,438 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 16:01:52,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=79500.0, ans=0.0 2024-08-09 16:02:00,179 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 16:02:05,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79600.0, ans=0.1 2024-08-09 16:02:29,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=79800.0, ans=0.125 2024-08-09 16:02:35,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=79800.0, ans=0.125 2024-08-09 16:02:36,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=79900.0, ans=0.125 2024-08-09 16:02:38,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=6.0 2024-08-09 16:02:47,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=79900.0, ans=0.2 2024-08-09 16:02:50,895 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-8000.pt 2024-08-09 16:02:54,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+01 3.066e+01 3.561e+01 4.217e+01 9.530e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 16:02:54,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8000, loss[loss=0.1031, beats_loss=0.01619, ecapa_loss=0.0002888, whisper_loss=0.08401, over 15247.00 frames. ], tot_loss[loss=0.1246, beats_loss=0.01322, ecapa_loss=0.000435, whisper_loss=0.1071, over 3883757.89 frames. ], batch size: 59, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:02:56,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=80000.0, ans=0.0 2024-08-09 16:03:07,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-08-09 16:03:50,075 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 16:03:51,617 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 16:04:01,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2024-08-09 16:04:01,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8050, loss[loss=0.1389, beats_loss=0.01153, ecapa_loss=0.0005055, whisper_loss=0.1223, over 20080.00 frames. ], tot_loss[loss=0.124, beats_loss=0.01325, ecapa_loss=0.000435, whisper_loss=0.1064, over 3887220.17 frames. ], batch size: 80, lr: 3.72e-02, grad_scale: 512.0 2024-08-09 16:04:02,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=80500.0, ans=0.125 2024-08-09 16:04:04,522 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-09 16:04:11,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.21 vs. limit=10.0 2024-08-09 16:04:22,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-09 16:04:35,124 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 16:04:49,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=22.5 2024-08-09 16:05:10,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 3.011e+01 3.515e+01 4.189e+01 8.391e+01, threshold=7.029e+01, percent-clipped=0.0 2024-08-09 16:05:10,297 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8100, loss[loss=0.08753, beats_loss=0.01652, ecapa_loss=0.0004698, whisper_loss=0.06631, over 12520.00 frames. ], tot_loss[loss=0.1238, beats_loss=0.01327, ecapa_loss=0.0004338, whisper_loss=0.1062, over 3870264.46 frames. ], batch size: 54, lr: 3.71e-02, grad_scale: 512.0 2024-08-09 16:05:14,942 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 16:05:16,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=81000.0, ans=0.125 2024-08-09 16:05:19,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.52 vs. limit=22.5 2024-08-09 16:05:21,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=81000.0, ans=0.0 2024-08-09 16:05:23,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81100.0, ans=0.1 2024-08-09 16:05:28,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=81100.0, ans=0.125 2024-08-09 16:05:29,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=81100.0, ans=0.125 2024-08-09 16:05:33,637 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-09 16:05:33,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=81100.0, ans=0.125 2024-08-09 16:05:40,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=81200.0, ans=0.0 2024-08-09 16:05:51,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.32 vs. limit=22.5 2024-08-09 16:05:52,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=81300.0, ans=0.09899494936611666 2024-08-09 16:06:03,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=81400.0, ans=0.125 2024-08-09 16:06:19,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8150, loss[loss=0.1185, beats_loss=0.01428, ecapa_loss=0.0004032, whisper_loss=0.1002, over 21722.00 frames. ], tot_loss[loss=0.1235, beats_loss=0.01319, ecapa_loss=0.0004355, whisper_loss=0.1059, over 3876604.10 frames. ], batch size: 89, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:06:27,655 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 16:06:31,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=81600.0, ans=0.05 2024-08-09 16:06:57,240 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 16:06:58,671 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 16:07:03,863 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 16:07:04,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=81800.0, ans=0.05 2024-08-09 16:07:07,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2024-08-09 16:07:24,972 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-09 16:07:27,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.111e+01 3.553e+01 4.149e+01 8.297e+01, threshold=7.106e+01, percent-clipped=2.0 2024-08-09 16:07:27,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8200, loss[loss=0.1202, beats_loss=0.01274, ecapa_loss=0.0004138, whisper_loss=0.1034, over 15459.00 frames. ], tot_loss[loss=0.123, beats_loss=0.01323, ecapa_loss=0.0004354, whisper_loss=0.1054, over 3884939.83 frames. ], batch size: 59, lr: 3.70e-02, grad_scale: 512.0 2024-08-09 16:07:39,931 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 16:07:41,428 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 16:07:50,776 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 16:07:54,761 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 16:07:59,259 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 16:08:29,588 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=7.675e-03 2024-08-09 16:08:35,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82500.0, ans=0.1 2024-08-09 16:08:36,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8250, loss[loss=0.1275, beats_loss=0.01131, ecapa_loss=0.0004594, whisper_loss=0.1116, over 17740.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01316, ecapa_loss=0.0004365, whisper_loss=0.1051, over 3874576.51 frames. ], batch size: 70, lr: 3.69e-02, grad_scale: 512.0 2024-08-09 16:08:49,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=82600.0, ans=0.2 2024-08-09 16:08:54,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2024-08-09 16:08:58,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82600.0, ans=0.1 2024-08-09 16:09:03,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=12.0 2024-08-09 16:09:11,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=82700.0, ans=0.125 2024-08-09 16:09:12,176 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-09 16:09:12,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=82700.0, ans=0.0 2024-08-09 16:09:19,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=82800.0, ans=0.0 2024-08-09 16:09:19,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=82800.0, ans=0.125 2024-08-09 16:09:25,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=82800.0, ans=0.2 2024-08-09 16:09:35,079 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 16:09:35,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2024-08-09 16:09:44,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.998e+01 3.523e+01 3.969e+01 6.917e+01, threshold=7.045e+01, percent-clipped=0.0 2024-08-09 16:09:44,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8300, loss[loss=0.141, beats_loss=0.01173, ecapa_loss=0.0003704, whisper_loss=0.1255, over 16393.00 frames. ], tot_loss[loss=0.1231, beats_loss=0.01317, ecapa_loss=0.0004334, whisper_loss=0.1056, over 3896021.20 frames. ], batch size: 61, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:09:46,286 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-09 16:09:54,272 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-09 16:09:55,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83000.0, ans=0.1 2024-08-09 16:10:36,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=83300.0, ans=0.0 2024-08-09 16:10:57,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8350, loss[loss=0.148, beats_loss=0.01435, ecapa_loss=0.0004137, whisper_loss=0.1295, over 24315.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01322, ecapa_loss=0.0004325, whisper_loss=0.1047, over 3880042.61 frames. ], batch size: 93, lr: 3.68e-02, grad_scale: 512.0 2024-08-09 16:11:17,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83600.0, ans=0.1 2024-08-09 16:11:34,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=83700.0, ans=0.1 2024-08-09 16:11:37,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=83700.0, ans=0.2 2024-08-09 16:11:40,766 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-09 16:11:45,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2024-08-09 16:11:45,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.20 vs. limit=10.0 2024-08-09 16:11:49,607 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 16:12:07,236 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 16:12:08,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.077e+01 3.401e+01 4.133e+01 6.317e+01, threshold=6.802e+01, percent-clipped=0.0 2024-08-09 16:12:08,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8400, loss[loss=0.1388, beats_loss=0.01317, ecapa_loss=0.0004112, whisper_loss=0.1215, over 21010.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01318, ecapa_loss=0.0004335, whisper_loss=0.1051, over 3899626.64 frames. ], batch size: 86, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:12:45,115 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-09 16:12:59,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-08-09 16:13:15,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=84400.0, ans=0.07 2024-08-09 16:13:28,923 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 16:13:30,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8450, loss[loss=0.1433, beats_loss=0.01321, ecapa_loss=0.0004255, whisper_loss=0.1258, over 21707.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01315, ecapa_loss=0.000435, whisper_loss=0.105, over 3896861.94 frames. ], batch size: 88, lr: 3.67e-02, grad_scale: 512.0 2024-08-09 16:13:32,190 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 16:13:48,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=84600.0, ans=0.025 2024-08-09 16:13:59,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=84700.0, ans=0.125 2024-08-09 16:14:11,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-08-09 16:14:16,018 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 16:14:17,802 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-09 16:14:24,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=84800.0, ans=0.0 2024-08-09 16:14:38,648 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 16:14:50,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2024-08-09 16:14:51,040 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.954e+01 3.407e+01 4.304e+01 7.894e+01, threshold=6.814e+01, percent-clipped=2.0 2024-08-09 16:14:51,061 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8500, loss[loss=0.1331, beats_loss=0.01532, ecapa_loss=0.0002965, whisper_loss=0.1149, over 15893.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.01317, ecapa_loss=0.0004331, whisper_loss=0.1051, over 3908409.38 frames. ], batch size: 58, lr: 3.66e-02, grad_scale: 512.0 2024-08-09 16:14:51,275 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 16:15:01,763 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 16:15:21,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=85100.0, ans=0.0 2024-08-09 16:15:45,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=22.5 2024-08-09 16:15:48,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2024-08-09 16:16:13,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=85400.0, ans=0.125 2024-08-09 16:16:14,946 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 16:16:15,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=85400.0, ans=0.07 2024-08-09 16:16:25,515 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 16:16:26,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8550, loss[loss=0.1184, beats_loss=0.01232, ecapa_loss=0.000464, whisper_loss=0.1015, over 16781.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01319, ecapa_loss=0.000433, whisper_loss=0.1045, over 3891061.23 frames. ], batch size: 67, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:16:28,458 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 16:16:45,918 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 16:17:10,491 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 16:17:29,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=85800.0, ans=0.125 2024-08-09 16:17:58,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-08-09 16:17:58,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-08-09 16:18:03,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2024-08-09 16:18:03,685 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.923e+01 3.374e+01 4.145e+01 6.398e+01, threshold=6.748e+01, percent-clipped=0.0 2024-08-09 16:18:03,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8600, loss[loss=0.1037, beats_loss=0.01911, ecapa_loss=0.0002948, whisper_loss=0.08164, over 22025.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01324, ecapa_loss=0.0004309, whisper_loss=0.1041, over 3903302.70 frames. ], batch size: 86, lr: 3.65e-02, grad_scale: 512.0 2024-08-09 16:18:06,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=86000.0, ans=0.0 2024-08-09 16:18:11,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=86000.0, ans=0.09899494936611666 2024-08-09 16:19:00,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=86300.0, ans=0.2 2024-08-09 16:19:16,300 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.965e+00 2024-08-09 16:19:36,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=86400.0, ans=0.125 2024-08-09 16:19:38,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=86500.0, ans=0.125 2024-08-09 16:19:40,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8650, loss[loss=0.1188, beats_loss=0.01366, ecapa_loss=0.0004061, whisper_loss=0.101, over 21767.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01331, ecapa_loss=0.0004286, whisper_loss=0.1039, over 3874586.39 frames. ], batch size: 87, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:19:41,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=86500.0, ans=0.0 2024-08-09 16:19:47,313 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-09 16:19:51,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.91 vs. limit=22.5 2024-08-09 16:19:51,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-08-09 16:19:54,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.26 vs. limit=22.5 2024-08-09 16:20:10,397 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 16:20:10,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2024-08-09 16:20:16,159 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 16:20:29,428 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 16:20:57,742 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-09 16:20:59,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.913e+01 3.504e+01 4.209e+01 7.626e+01, threshold=7.009e+01, percent-clipped=5.0 2024-08-09 16:20:59,469 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8700, loss[loss=0.1293, beats_loss=0.01274, ecapa_loss=0.0003577, whisper_loss=0.113, over 14579.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01334, ecapa_loss=0.000428, whisper_loss=0.1043, over 3872628.50 frames. ], batch size: 54, lr: 3.64e-02, grad_scale: 512.0 2024-08-09 16:21:27,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=12.0 2024-08-09 16:21:30,331 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-09 16:21:32,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=87100.0, ans=0.125 2024-08-09 16:22:01,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2024-08-09 16:22:19,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=87400.0, ans=0.125 2024-08-09 16:22:27,618 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-09 16:22:28,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8750, loss[loss=0.1303, beats_loss=0.009797, ecapa_loss=0.0005284, whisper_loss=0.1152, over 21966.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01331, ecapa_loss=0.0004296, whisper_loss=0.104, over 3851126.47 frames. ], batch size: 90, lr: 3.63e-02, grad_scale: 512.0 2024-08-09 16:22:31,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-09 16:22:45,166 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 16:22:49,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=87600.0, ans=0.0 2024-08-09 16:23:05,637 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-09 16:23:16,679 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-09 16:23:29,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=87900.0, ans=0.125 2024-08-09 16:23:32,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=87900.0, ans=0.0 2024-08-09 16:23:39,377 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.881e+01 3.394e+01 4.029e+01 7.137e+01, threshold=6.788e+01, percent-clipped=1.0 2024-08-09 16:23:39,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8800, loss[loss=0.1375, beats_loss=0.01053, ecapa_loss=0.0004695, whisper_loss=0.1223, over 23062.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.0134, ecapa_loss=0.000423, whisper_loss=0.1039, over 3854913.93 frames. ], batch size: 90, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:23:48,572 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 16:24:00,925 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-09 16:24:06,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.82 vs. limit=22.5 2024-08-09 16:24:14,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-08-09 16:24:34,044 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 16:24:51,507 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8850, loss[loss=0.1383, beats_loss=0.01132, ecapa_loss=0.0003036, whisper_loss=0.124, over 15443.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01341, ecapa_loss=0.0004226, whisper_loss=0.1036, over 3858617.32 frames. ], batch size: 54, lr: 3.62e-02, grad_scale: 512.0 2024-08-09 16:24:51,730 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 16:25:02,534 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 16:25:04,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=88600.0, ans=0.0 2024-08-09 16:25:59,488 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 16:26:01,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.897e+01 3.367e+01 4.055e+01 6.951e+01, threshold=6.734e+01, percent-clipped=1.0 2024-08-09 16:26:01,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8900, loss[loss=0.1133, beats_loss=0.01318, ecapa_loss=0.0004368, whisper_loss=0.09578, over 15031.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01333, ecapa_loss=0.0004248, whisper_loss=0.1035, over 3836685.37 frames. ], batch size: 62, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:26:09,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-08-09 16:26:13,316 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 16:26:13,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=89000.0, ans=0.125 2024-08-09 16:26:17,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=89100.0, ans=0.2 2024-08-09 16:26:25,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=89100.0, ans=0.1 2024-08-09 16:26:29,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=89200.0, ans=0.125 2024-08-09 16:26:34,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=89200.0, ans=0.125 2024-08-09 16:26:59,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=89400.0, ans=0.125 2024-08-09 16:27:10,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 8950, loss[loss=0.1329, beats_loss=0.01342, ecapa_loss=0.000407, whisper_loss=0.1154, over 19390.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01332, ecapa_loss=0.0004255, whisper_loss=0.1042, over 3863272.30 frames. ], batch size: 75, lr: 3.61e-02, grad_scale: 512.0 2024-08-09 16:27:14,867 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 16:27:20,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89500.0, ans=0.0 2024-08-09 16:27:29,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2024-08-09 16:27:49,055 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 16:27:51,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=89800.0, ans=0.125 2024-08-09 16:27:54,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89800.0, ans=0.1 2024-08-09 16:28:06,259 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-09 16:28:15,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.66 vs. limit=10.0 2024-08-09 16:28:20,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.962e+01 3.391e+01 3.948e+01 7.468e+01, threshold=6.781e+01, percent-clipped=1.0 2024-08-09 16:28:20,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9000, loss[loss=0.1421, beats_loss=0.01087, ecapa_loss=0.0003929, whisper_loss=0.1273, over 17564.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01331, ecapa_loss=0.0004244, whisper_loss=0.1041, over 3853045.99 frames. ], batch size: 65, lr: 3.60e-02, grad_scale: 512.0 2024-08-09 16:28:20,172 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 16:29:00,133 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on ASR_libri: loss=0.2932, beats_loss=0, ecapa_loss=0.001188, whisper_loss=0.2813, over 922467.00 frames. 2024-08-09 16:29:16,778 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on SV_voxceleb1: loss=0.01105, beats_loss=0, ecapa_loss=0.001105, whisper_loss=0, over 939242.00 frames. 2024-08-09 16:30:37,524 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0939, 2.3048, 2.1413, 2.0007], device='cuda:0') 2024-08-09 16:31:15,714 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on AT_audioset: loss=0.03209, beats_loss=0.03209, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 16:31:15,719 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 16:31:29,945 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-09 16:31:32,913 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 16:31:38,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90100.0, ans=0.125 2024-08-09 16:31:43,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.80 vs. limit=10.0 2024-08-09 16:31:45,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-09 16:31:47,742 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 25 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-09 16:31:49,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=90200.0, ans=0.125 2024-08-09 16:31:55,989 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 8 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-09 16:32:03,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90300.0, ans=0.1 2024-08-09 16:32:24,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9050, loss[loss=0.1276, beats_loss=0.01361, ecapa_loss=0.0003463, whisper_loss=0.1105, over 14780.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01334, ecapa_loss=0.0004217, whisper_loss=0.1038, over 3859500.48 frames. ], batch size: 57, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:32:26,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=90500.0, ans=0.0 2024-08-09 16:32:33,208 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-09 16:32:35,722 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 16:32:57,850 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-09 16:33:15,358 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-09 16:33:24,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2024-08-09 16:33:27,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.79 vs. limit=5.0 2024-08-09 16:33:32,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.994e+01 3.542e+01 4.086e+01 6.210e+01, threshold=7.084e+01, percent-clipped=0.0 2024-08-09 16:33:32,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9100, loss[loss=0.1119, beats_loss=0.01435, ecapa_loss=0.0003788, whisper_loss=0.0938, over 17054.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.0133, ecapa_loss=0.0004244, whisper_loss=0.1042, over 3828612.10 frames. ], batch size: 68, lr: 3.59e-02, grad_scale: 512.0 2024-08-09 16:33:49,287 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 16:33:57,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=91100.0, ans=0.0 2024-08-09 16:34:01,999 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-09 16:34:06,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.283e-02 2024-08-09 16:34:07,510 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 16:34:20,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=91300.0, ans=0.05 2024-08-09 16:34:22,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=91300.0, ans=0.125 2024-08-09 16:34:35,848 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.737e-01 2024-08-09 16:34:35,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-08-09 16:34:41,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9150, loss[loss=0.1333, beats_loss=0.009045, ecapa_loss=0.0005202, whisper_loss=0.1191, over 19854.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01334, ecapa_loss=0.0004228, whisper_loss=0.1038, over 3844893.54 frames. ], batch size: 79, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:34:47,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=91500.0, ans=0.0 2024-08-09 16:34:50,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=91500.0, ans=0.125 2024-08-09 16:35:00,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=91600.0, ans=0.0 2024-08-09 16:35:02,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=91600.0, ans=0.125 2024-08-09 16:35:12,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=91700.0, ans=0.125 2024-08-09 16:35:14,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=91700.0, ans=0.0 2024-08-09 16:35:15,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=91700.0, ans=0.125 2024-08-09 16:35:27,665 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-09 16:35:36,931 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 16:35:42,466 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-09 16:35:42,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=91900.0, ans=0.0 2024-08-09 16:35:49,424 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.827e+01 3.202e+01 3.925e+01 7.636e+01, threshold=6.404e+01, percent-clipped=0.0 2024-08-09 16:35:49,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9200, loss[loss=0.1106, beats_loss=0.01382, ecapa_loss=0.0003208, whisper_loss=0.09362, over 15300.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01339, ecapa_loss=0.0004257, whisper_loss=0.104, over 3852101.26 frames. ], batch size: 57, lr: 3.58e-02, grad_scale: 512.0 2024-08-09 16:35:51,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=92000.0, ans=0.1 2024-08-09 16:35:54,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92000.0, ans=0.1 2024-08-09 16:35:59,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2024-08-09 16:36:01,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=15.0 2024-08-09 16:36:18,279 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-09 16:36:32,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=92300.0, ans=0.025 2024-08-09 16:36:45,174 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 16:36:58,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9250, loss[loss=0.1095, beats_loss=0.0128, ecapa_loss=0.0005288, whisper_loss=0.09144, over 19986.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.0134, ecapa_loss=0.0004249, whisper_loss=0.1041, over 3850327.00 frames. ], batch size: 85, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:37:00,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=92500.0, ans=0.0 2024-08-09 16:37:01,797 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 16:37:11,249 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-09 16:37:11,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=92600.0, ans=0.0 2024-08-09 16:37:16,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=92600.0, ans=0.125 2024-08-09 16:37:29,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=92700.0, ans=0.125 2024-08-09 16:37:31,661 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 16:37:47,477 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 16:37:50,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=92800.0, ans=0.0 2024-08-09 16:38:01,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=92900.0, ans=0.2 2024-08-09 16:38:07,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.067e+01 3.450e+01 4.093e+01 6.352e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 16:38:07,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9300, loss[loss=0.1304, beats_loss=0.01224, ecapa_loss=0.0004253, whisper_loss=0.114, over 23093.00 frames. ], tot_loss[loss=0.1226, beats_loss=0.0132, ecapa_loss=0.0004258, whisper_loss=0.1051, over 3856583.47 frames. ], batch size: 92, lr: 3.57e-02, grad_scale: 512.0 2024-08-09 16:38:14,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.26 vs. limit=22.5 2024-08-09 16:38:15,539 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-09 16:38:15,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=93000.0, ans=0.125 2024-08-09 16:38:20,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=93100.0, ans=0.07 2024-08-09 16:38:24,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=93100.0, ans=0.125 2024-08-09 16:38:24,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=93100.0, ans=0.125 2024-08-09 16:38:31,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=93100.0, ans=0.125 2024-08-09 16:38:37,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=93200.0, ans=0.0 2024-08-09 16:38:53,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=93300.0, ans=0.0 2024-08-09 16:38:53,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=93300.0, ans=0.0 2024-08-09 16:38:55,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=93300.0, ans=0.05 2024-08-09 16:38:55,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=93300.0, ans=0.0 2024-08-09 16:39:08,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=93400.0, ans=0.125 2024-08-09 16:39:15,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9350, loss[loss=0.1108, beats_loss=0.01528, ecapa_loss=0.0003673, whisper_loss=0.09184, over 16865.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01322, ecapa_loss=0.0004234, whisper_loss=0.1052, over 3866089.25 frames. ], batch size: 67, lr: 3.56e-02, grad_scale: 512.0 2024-08-09 16:39:45,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=93700.0, ans=0.0 2024-08-09 16:39:54,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=93700.0, ans=0.125 2024-08-09 16:40:14,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93900.0, ans=0.1 2024-08-09 16:40:16,638 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 16:40:24,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.921e+01 3.226e+01 3.791e+01 1.210e+02, threshold=6.451e+01, percent-clipped=3.0 2024-08-09 16:40:24,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9400, loss[loss=0.1199, beats_loss=0.01396, ecapa_loss=0.0003293, whisper_loss=0.1026, over 21559.00 frames. ], tot_loss[loss=0.1224, beats_loss=0.01331, ecapa_loss=0.0004226, whisper_loss=0.1049, over 3872267.42 frames. ], batch size: 83, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:40:28,927 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 32 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 16:40:29,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2024-08-09 16:40:30,478 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.015e+00 2024-08-09 16:40:34,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=94000.0, ans=0.05 2024-08-09 16:41:05,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=94300.0, ans=0.0 2024-08-09 16:41:15,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=94300.0, ans=0.125 2024-08-09 16:41:17,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=94300.0, ans=0.2 2024-08-09 16:41:21,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=94400.0, ans=0.0 2024-08-09 16:41:27,606 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-09 16:41:28,917 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 16:41:32,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9450, loss[loss=0.1102, beats_loss=0.01168, ecapa_loss=0.0004772, whisper_loss=0.09373, over 15156.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01324, ecapa_loss=0.0004243, whisper_loss=0.1045, over 3854506.38 frames. ], batch size: 63, lr: 3.55e-02, grad_scale: 512.0 2024-08-09 16:41:34,273 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-09 16:41:57,096 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 16:42:09,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=94700.0, ans=15.0 2024-08-09 16:42:12,088 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-09 16:42:24,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-08-09 16:42:25,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=94900.0, ans=0.125 2024-08-09 16:42:28,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=94900.0, ans=0.125 2024-08-09 16:42:28,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.11 vs. limit=10.0 2024-08-09 16:42:36,517 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-09 16:42:40,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.979e+01 3.573e+01 4.112e+01 7.498e+01, threshold=7.146e+01, percent-clipped=2.0 2024-08-09 16:42:40,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9500, loss[loss=0.1217, beats_loss=0.01364, ecapa_loss=0.0004373, whisper_loss=0.1037, over 18375.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.01322, ecapa_loss=0.0004226, whisper_loss=0.1041, over 3851812.97 frames. ], batch size: 74, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:42:43,670 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 16:42:56,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.93 vs. limit=10.0 2024-08-09 16:42:59,693 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 16:43:02,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=95100.0, ans=0.125 2024-08-09 16:43:43,962 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.054e-01 2024-08-09 16:43:48,735 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9550, loss[loss=0.1162, beats_loss=0.0143, ecapa_loss=0.0004869, whisper_loss=0.09704, over 22373.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01318, ecapa_loss=0.0004233, whisper_loss=0.1043, over 3830319.05 frames. ], batch size: 95, lr: 3.54e-02, grad_scale: 512.0 2024-08-09 16:43:54,881 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-09 16:44:08,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=95600.0, ans=0.125 2024-08-09 16:44:19,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=95700.0, ans=0.125 2024-08-09 16:44:19,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2024-08-09 16:44:39,605 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 16:44:41,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=95800.0, ans=0.04949747468305833 2024-08-09 16:44:56,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.093e+01 3.544e+01 4.156e+01 7.056e+01, threshold=7.088e+01, percent-clipped=0.0 2024-08-09 16:44:56,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9600, loss[loss=0.1216, beats_loss=0.01527, ecapa_loss=0.0003876, whisper_loss=0.1025, over 16879.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01322, ecapa_loss=0.0004227, whisper_loss=0.1044, over 3866387.39 frames. ], batch size: 68, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:45:04,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=96000.0, ans=0.125 2024-08-09 16:45:08,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=96100.0, ans=0.2 2024-08-09 16:45:14,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=15.0 2024-08-09 16:45:28,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=96200.0, ans=0.0 2024-08-09 16:46:04,508 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9650, loss[loss=0.1244, beats_loss=0.01226, ecapa_loss=0.0003833, whisper_loss=0.1083, over 19580.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01321, ecapa_loss=0.0004211, whisper_loss=0.1041, over 3841852.24 frames. ], batch size: 72, lr: 3.53e-02, grad_scale: 512.0 2024-08-09 16:46:18,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=96600.0, ans=0.125 2024-08-09 16:46:19,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-08-09 16:46:25,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.68 vs. limit=10.0 2024-08-09 16:46:33,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=96700.0, ans=0.2 2024-08-09 16:46:33,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=96700.0, ans=0.125 2024-08-09 16:46:52,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-08-09 16:46:58,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=96900.0, ans=0.125 2024-08-09 16:47:12,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.968e+01 3.449e+01 4.387e+01 7.611e+01, threshold=6.898e+01, percent-clipped=2.0 2024-08-09 16:47:12,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9700, loss[loss=0.1166, beats_loss=0.01522, ecapa_loss=0.0004932, whisper_loss=0.09647, over 21672.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01311, ecapa_loss=0.0004234, whisper_loss=0.1046, over 3853943.54 frames. ], batch size: 93, lr: 3.52e-02, grad_scale: 512.0 2024-08-09 16:47:14,363 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 16:47:37,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=97100.0, ans=0.125 2024-08-09 16:47:42,314 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 16:47:45,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=97200.0, ans=0.125 2024-08-09 16:47:45,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.78 vs. limit=22.5 2024-08-09 16:47:54,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=97300.0, ans=0.125 2024-08-09 16:48:22,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9750, loss[loss=0.1191, beats_loss=0.01241, ecapa_loss=0.0004251, whisper_loss=0.1025, over 14964.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01316, ecapa_loss=0.0004202, whisper_loss=0.1043, over 3859488.70 frames. ], batch size: 57, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:48:30,896 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 16:48:38,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-09 16:49:00,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=12.0 2024-08-09 16:49:01,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=97700.0, ans=0.2 2024-08-09 16:49:03,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=97800.0, ans=0.125 2024-08-09 16:49:09,435 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 16:49:13,539 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 19 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-09 16:49:14,960 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 16:49:22,985 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 16:49:23,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=97900.0, ans=0.125 2024-08-09 16:49:31,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.868e+01 3.333e+01 3.887e+01 7.337e+01, threshold=6.667e+01, percent-clipped=1.0 2024-08-09 16:49:31,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9800, loss[loss=0.1254, beats_loss=0.009652, ecapa_loss=0.0005187, whisper_loss=0.1105, over 17557.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01319, ecapa_loss=0.0004196, whisper_loss=0.1032, over 3839905.32 frames. ], batch size: 71, lr: 3.51e-02, grad_scale: 512.0 2024-08-09 16:49:52,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-09 16:50:08,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=98200.0, ans=0.2 2024-08-09 16:50:29,758 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-09 16:50:39,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=98400.0, ans=0.125 2024-08-09 16:50:47,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9850, loss[loss=0.1153, beats_loss=0.0107, ecapa_loss=0.0004735, whisper_loss=0.09982, over 16418.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01309, ecapa_loss=0.0004192, whisper_loss=0.1044, over 3827242.00 frames. ], batch size: 64, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:51:02,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=98500.0, ans=0.0 2024-08-09 16:51:03,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=98600.0, ans=0.125 2024-08-09 16:51:26,545 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-09 16:51:38,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=98800.0, ans=0.125 2024-08-09 16:51:53,249 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-09 16:52:11,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.990e+01 3.470e+01 4.121e+01 8.675e+01, threshold=6.939e+01, percent-clipped=3.0 2024-08-09 16:52:11,616 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9900, loss[loss=0.111, beats_loss=0.01342, ecapa_loss=0.0003438, whisper_loss=0.0941, over 22495.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01311, ecapa_loss=0.0004148, whisper_loss=0.104, over 3844442.56 frames. ], batch size: 87, lr: 3.50e-02, grad_scale: 512.0 2024-08-09 16:52:33,510 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 31 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-09 16:52:33,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=99100.0, ans=0.125 2024-08-09 16:53:00,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=99200.0, ans=0.125 2024-08-09 16:53:04,918 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-09 16:53:35,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 9950, loss[loss=0.1359, beats_loss=0.01467, ecapa_loss=0.0003405, whisper_loss=0.1178, over 23761.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.0132, ecapa_loss=0.0004118, whisper_loss=0.1041, over 3867827.65 frames. ], batch size: 93, lr: 3.49e-02, grad_scale: 512.0 2024-08-09 16:53:36,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2024-08-09 16:53:50,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=99500.0, ans=0.5 2024-08-09 16:53:53,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2024-08-09 16:53:55,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=99600.0, ans=0.1 2024-08-09 16:54:16,394 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 16:54:18,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99700.0, ans=0.1 2024-08-09 16:54:21,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.88 vs. limit=22.5 2024-08-09 16:54:22,472 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 16:54:25,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=99800.0, ans=0.125 2024-08-09 16:54:46,621 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-09 16:54:53,710 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.893e+01 3.392e+01 3.870e+01 8.367e+01, threshold=6.783e+01, percent-clipped=1.0 2024-08-09 16:54:53,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10000, loss[loss=0.1217, beats_loss=0.01229, ecapa_loss=0.0004086, whisper_loss=0.1053, over 16036.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01315, ecapa_loss=0.0004149, whisper_loss=0.1047, over 3869582.74 frames. ], batch size: 61, lr: 3.49e-02, grad_scale: 1024.0 2024-08-09 16:55:15,010 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 16:55:23,637 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-09 16:55:29,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=100200.0, ans=0.09899494936611666 2024-08-09 16:55:32,159 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 16:55:32,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=15.0 2024-08-09 16:56:03,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10050, loss[loss=0.1285, beats_loss=0.01256, ecapa_loss=0.0004212, whisper_loss=0.1118, over 21332.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01307, ecapa_loss=0.0004127, whisper_loss=0.1055, over 3844320.15 frames. ], batch size: 86, lr: 3.48e-02, grad_scale: 1024.0 2024-08-09 16:56:07,041 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 37 from Vox, 30 fro AS 2024-08-09 16:56:13,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=100500.0, ans=0.0 2024-08-09 16:56:17,007 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 16:56:26,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100600.0, ans=0.1 2024-08-09 16:56:27,742 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-09 16:56:45,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=100800.0, ans=0.1 2024-08-09 16:56:52,648 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 16:57:00,768 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 16:57:02,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=100900.0, ans=0.1 2024-08-09 16:57:07,730 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 16:57:07,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=100900.0, ans=0.125 2024-08-09 16:57:12,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.921e+01 3.378e+01 4.111e+01 6.632e+01, threshold=6.756e+01, percent-clipped=0.0 2024-08-09 16:57:12,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10100, loss[loss=0.1051, beats_loss=0.01506, ecapa_loss=0.0004928, whisper_loss=0.08516, over 20983.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01311, ecapa_loss=0.0004097, whisper_loss=0.1055, over 3887373.12 frames. ], batch size: 93, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:57:14,156 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-09 16:57:18,241 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 16:57:27,758 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 16:57:33,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-09 16:58:08,921 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 16:58:10,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=101400.0, ans=0.125 2024-08-09 16:58:20,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10150, loss[loss=0.1383, beats_loss=0.01308, ecapa_loss=0.0004097, whisper_loss=0.1211, over 22621.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01322, ecapa_loss=0.0004113, whisper_loss=0.1054, over 3929269.52 frames. ], batch size: 89, lr: 3.47e-02, grad_scale: 1024.0 2024-08-09 16:58:32,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=101500.0, ans=0.125 2024-08-09 16:58:46,285 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 16:59:06,963 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 16:59:11,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=101800.0, ans=0.125 2024-08-09 16:59:15,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=101900.0, ans=0.125 2024-08-09 16:59:27,305 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 16:59:29,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.923e+01 3.411e+01 4.089e+01 6.898e+01, threshold=6.822e+01, percent-clipped=2.0 2024-08-09 16:59:30,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10200, loss[loss=0.1238, beats_loss=0.01148, ecapa_loss=0.000549, whisper_loss=0.1068, over 15447.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01313, ecapa_loss=0.0004135, whisper_loss=0.105, over 3892787.76 frames. ], batch size: 64, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 16:59:31,765 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 16:59:39,683 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 16:59:42,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=102100.0, ans=0.125 2024-08-09 16:59:53,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=102100.0, ans=0.125 2024-08-09 17:00:02,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=102200.0, ans=22.5 2024-08-09 17:00:14,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=102300.0, ans=0.1 2024-08-09 17:00:38,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10250, loss[loss=0.1296, beats_loss=0.01306, ecapa_loss=0.000333, whisper_loss=0.1132, over 21062.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01306, ecapa_loss=0.0004131, whisper_loss=0.1055, over 3911255.46 frames. ], batch size: 82, lr: 3.46e-02, grad_scale: 1024.0 2024-08-09 17:00:39,058 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 17:00:44,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=102500.0, ans=0.125 2024-08-09 17:00:45,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102500.0, ans=0.1 2024-08-09 17:00:48,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=102500.0, ans=0.0 2024-08-09 17:00:54,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=15.0 2024-08-09 17:00:59,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=102600.0, ans=0.125 2024-08-09 17:01:05,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=102700.0, ans=0.125 2024-08-09 17:01:20,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-08-09 17:01:36,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=102900.0, ans=0.125 2024-08-09 17:01:37,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2024-08-09 17:01:47,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 2.938e+01 3.467e+01 4.292e+01 7.706e+01, threshold=6.934e+01, percent-clipped=1.0 2024-08-09 17:01:47,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10300, loss[loss=0.1076, beats_loss=0.01489, ecapa_loss=0.000452, whisper_loss=0.08818, over 21055.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01308, ecapa_loss=0.0004143, whisper_loss=0.1054, over 3919259.98 frames. ], batch size: 89, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:01:47,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=103000.0, ans=0.0 2024-08-09 17:01:51,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=103000.0, ans=0.125 2024-08-09 17:01:53,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=103000.0, ans=0.125 2024-08-09 17:02:03,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=103100.0, ans=0.0 2024-08-09 17:02:29,380 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 17:02:41,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2024-08-09 17:02:52,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2024-08-09 17:02:54,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10350, loss[loss=0.1368, beats_loss=0.01177, ecapa_loss=0.000485, whisper_loss=0.1202, over 16264.00 frames. ], tot_loss[loss=0.1229, beats_loss=0.0131, ecapa_loss=0.0004138, whisper_loss=0.1057, over 3941757.55 frames. ], batch size: 66, lr: 3.45e-02, grad_scale: 1024.0 2024-08-09 17:03:04,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2024-08-09 17:03:05,527 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-09 17:03:09,392 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 26 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-09 17:03:25,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2024-08-09 17:03:28,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103700.0, ans=0.1 2024-08-09 17:03:35,939 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-09 17:03:36,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=103800.0, ans=0.125 2024-08-09 17:03:48,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=103900.0, ans=0.5 2024-08-09 17:03:52,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=103900.0, ans=0.125 2024-08-09 17:03:53,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=103900.0, ans=0.0 2024-08-09 17:03:54,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2024-08-09 17:03:57,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=103900.0, ans=0.125 2024-08-09 17:04:02,863 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 3.016e+01 3.413e+01 4.405e+01 7.924e+01, threshold=6.827e+01, percent-clipped=1.0 2024-08-09 17:04:02,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10400, loss[loss=0.1185, beats_loss=0.01359, ecapa_loss=0.0004079, whisper_loss=0.1009, over 18405.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01306, ecapa_loss=0.0004124, whisper_loss=0.1055, over 3909375.60 frames. ], batch size: 74, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:04:28,382 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 17:04:38,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=104200.0, ans=0.2 2024-08-09 17:04:42,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=104200.0, ans=0.0 2024-08-09 17:04:57,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=104400.0, ans=0.0 2024-08-09 17:05:12,211 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10450, loss[loss=0.1123, beats_loss=0.01386, ecapa_loss=0.0004298, whisper_loss=0.09418, over 21334.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01306, ecapa_loss=0.0004125, whisper_loss=0.105, over 3897925.40 frames. ], batch size: 90, lr: 3.44e-02, grad_scale: 1024.0 2024-08-09 17:05:15,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=104500.0, ans=0.125 2024-08-09 17:05:42,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-09 17:05:53,892 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 17:05:58,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=104800.0, ans=0.2 2024-08-09 17:06:14,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=104900.0, ans=0.0 2024-08-09 17:06:16,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.06 vs. limit=10.0 2024-08-09 17:06:17,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=104900.0, ans=0.125 2024-08-09 17:06:21,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=105000.0, ans=0.125 2024-08-09 17:06:22,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.355e+01 3.012e+01 3.451e+01 3.999e+01 6.423e+01, threshold=6.903e+01, percent-clipped=0.0 2024-08-09 17:06:22,514 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10500, loss[loss=0.1261, beats_loss=0.01145, ecapa_loss=0.0004012, whisper_loss=0.1107, over 22390.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01302, ecapa_loss=0.0004124, whisper_loss=0.105, over 3882193.21 frames. ], batch size: 90, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:06:34,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=105000.0, ans=0.0 2024-08-09 17:06:56,082 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-09 17:07:08,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=105300.0, ans=0.125 2024-08-09 17:07:13,001 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 17:07:22,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=105400.0, ans=0.0 2024-08-09 17:07:28,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=105400.0, ans=0.125 2024-08-09 17:07:32,200 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10550, loss[loss=0.1317, beats_loss=0.01301, ecapa_loss=0.0003965, whisper_loss=0.1148, over 19022.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01312, ecapa_loss=0.0004115, whisper_loss=0.1041, over 3879179.25 frames. ], batch size: 74, lr: 3.43e-02, grad_scale: 1024.0 2024-08-09 17:07:42,017 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 17:07:51,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=105600.0, ans=0.2 2024-08-09 17:08:03,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.11 vs. limit=15.0 2024-08-09 17:08:24,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2024-08-09 17:08:41,497 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.989e+01 3.482e+01 4.095e+01 9.318e+01, threshold=6.964e+01, percent-clipped=2.0 2024-08-09 17:08:41,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10600, loss[loss=0.1168, beats_loss=0.01401, ecapa_loss=0.0003988, whisper_loss=0.09881, over 21820.00 frames. ], tot_loss[loss=0.121, beats_loss=0.01314, ecapa_loss=0.0004102, whisper_loss=0.1038, over 3873401.29 frames. ], batch size: 90, lr: 3.42e-02, grad_scale: 1024.0 2024-08-09 17:08:43,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=106000.0, ans=0.2 2024-08-09 17:08:47,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106000.0, ans=0.1 2024-08-09 17:08:50,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=106000.0, ans=0.125 2024-08-09 17:08:55,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.82 vs. limit=22.5 2024-08-09 17:08:57,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-09 17:08:59,919 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 17:09:03,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106100.0, ans=0.1 2024-08-09 17:09:19,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=106200.0, ans=0.0 2024-08-09 17:09:38,229 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.229e+01 2024-08-09 17:09:43,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=106400.0, ans=0.2 2024-08-09 17:09:45,117 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 17:09:46,497 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 17:09:48,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=106400.0, ans=0.2 2024-08-09 17:09:51,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10650, loss[loss=0.1418, beats_loss=0.01208, ecapa_loss=0.0004582, whisper_loss=0.1251, over 22118.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01317, ecapa_loss=0.0004071, whisper_loss=0.104, over 3851474.84 frames. ], batch size: 89, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:09:56,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=106500.0, ans=0.125 2024-08-09 17:09:58,761 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 17:10:03,318 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 17:10:08,751 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-09 17:10:17,163 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 17:10:35,257 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-09 17:10:42,040 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 23 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-09 17:10:42,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.22 vs. limit=10.0 2024-08-09 17:10:51,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-08-09 17:10:56,189 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 17:11:01,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 3.109e+01 3.454e+01 4.119e+01 5.374e+01, threshold=6.908e+01, percent-clipped=0.0 2024-08-09 17:11:01,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10700, loss[loss=0.1321, beats_loss=0.01083, ecapa_loss=0.0003588, whisper_loss=0.1177, over 16733.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01317, ecapa_loss=0.0004036, whisper_loss=0.1041, over 3831317.92 frames. ], batch size: 62, lr: 3.41e-02, grad_scale: 1024.0 2024-08-09 17:11:26,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=107100.0, ans=0.125 2024-08-09 17:11:33,709 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 17:12:06,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=107400.0, ans=0.0 2024-08-09 17:12:07,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=107400.0, ans=0.125 2024-08-09 17:12:10,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10750, loss[loss=0.1281, beats_loss=0.01332, ecapa_loss=0.0003074, whisper_loss=0.1117, over 16240.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01322, ecapa_loss=0.0004059, whisper_loss=0.1044, over 3843549.47 frames. ], batch size: 61, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:12:12,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=107500.0, ans=0.0 2024-08-09 17:12:13,155 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 17:13:02,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=107800.0, ans=0.125 2024-08-09 17:13:10,109 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 17:13:18,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.960e+01 3.558e+01 4.572e+01 9.073e+01, threshold=7.116e+01, percent-clipped=3.0 2024-08-09 17:13:18,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10800, loss[loss=0.1053, beats_loss=0.01695, ecapa_loss=0.0003076, whisper_loss=0.08525, over 22317.00 frames. ], tot_loss[loss=0.1216, beats_loss=0.0132, ecapa_loss=0.000405, whisper_loss=0.1043, over 3854053.10 frames. ], batch size: 88, lr: 3.40e-02, grad_scale: 1024.0 2024-08-09 17:13:30,958 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 17:13:33,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2024-08-09 17:13:44,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=108200.0, ans=0.2 2024-08-09 17:13:59,539 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 17:14:03,242 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 17:14:26,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.56 vs. limit=5.0 2024-08-09 17:14:26,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10850, loss[loss=0.1004, beats_loss=0.01336, ecapa_loss=0.0003657, whisper_loss=0.08335, over 21673.00 frames. ], tot_loss[loss=0.1214, beats_loss=0.01312, ecapa_loss=0.0004031, whisper_loss=0.1042, over 3878154.96 frames. ], batch size: 84, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:14:29,307 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 17:14:32,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=108500.0, ans=0.0 2024-08-09 17:14:55,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=108700.0, ans=10.0 2024-08-09 17:14:56,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.37 vs. limit=22.5 2024-08-09 17:15:02,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=12.0 2024-08-09 17:15:16,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=108800.0, ans=0.2 2024-08-09 17:15:28,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.32 vs. limit=15.0 2024-08-09 17:15:30,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=108900.0, ans=0.2 2024-08-09 17:15:35,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.154e+01 3.497e+01 4.138e+01 7.474e+01, threshold=6.993e+01, percent-clipped=1.0 2024-08-09 17:15:35,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10900, loss[loss=0.09338, beats_loss=0.0163, ecapa_loss=0.0003237, whisper_loss=0.07385, over 20652.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01312, ecapa_loss=0.0004024, whisper_loss=0.1043, over 3899278.66 frames. ], batch size: 84, lr: 3.39e-02, grad_scale: 1024.0 2024-08-09 17:15:37,172 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-09 17:15:37,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=109000.0, ans=0.125 2024-08-09 17:15:41,332 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 17:15:41,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=109000.0, ans=0.125 2024-08-09 17:15:49,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-09 17:16:07,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=15.0 2024-08-09 17:16:13,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=109200.0, ans=0.125 2024-08-09 17:16:14,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=109200.0, ans=0.0 2024-08-09 17:16:39,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=109400.0, ans=0.0 2024-08-09 17:16:43,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 10950, loss[loss=0.09836, beats_loss=0.01469, ecapa_loss=0.0004145, whisper_loss=0.07953, over 18458.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01306, ecapa_loss=0.0004025, whisper_loss=0.1049, over 3909772.30 frames. ], batch size: 77, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:17:02,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=109600.0, ans=0.1 2024-08-09 17:17:11,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=109700.0, ans=0.125 2024-08-09 17:17:11,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=109700.0, ans=0.125 2024-08-09 17:17:17,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=109700.0, ans=0.125 2024-08-09 17:17:17,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=109700.0, ans=0.125 2024-08-09 17:17:18,173 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 22 from Vox, 14 fro AS 2024-08-09 17:17:19,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=109700.0, ans=0.0 2024-08-09 17:17:20,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=109700.0, ans=0.0 2024-08-09 17:17:23,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=12.0 2024-08-09 17:17:26,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-09 17:17:43,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=109900.0, ans=0.0 2024-08-09 17:17:43,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=109900.0, ans=0.95 2024-08-09 17:17:48,873 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 29 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-09 17:17:51,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.947e+01 3.240e+01 3.931e+01 5.659e+01, threshold=6.481e+01, percent-clipped=0.0 2024-08-09 17:17:51,557 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11000, loss[loss=0.1294, beats_loss=0.009645, ecapa_loss=0.0004397, whisper_loss=0.1154, over 16460.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01305, ecapa_loss=0.0004029, whisper_loss=0.1044, over 3903380.99 frames. ], batch size: 65, lr: 3.38e-02, grad_scale: 1024.0 2024-08-09 17:17:59,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=12.0 2024-08-09 17:18:05,876 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 17:18:16,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=110100.0, ans=0.2 2024-08-09 17:18:21,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2024-08-09 17:18:28,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=110200.0, ans=0.0 2024-08-09 17:18:30,386 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-09 17:18:38,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=110300.0, ans=0.125 2024-08-09 17:19:00,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11050, loss[loss=0.1048, beats_loss=0.01479, ecapa_loss=0.0003287, whisper_loss=0.08673, over 15123.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01295, ecapa_loss=0.0004045, whisper_loss=0.1051, over 3928363.06 frames. ], batch size: 57, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:19:05,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=110500.0, ans=0.125 2024-08-09 17:19:07,779 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 26 from Vox, 17 fro AS 2024-08-09 17:19:12,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=110500.0, ans=0.125 2024-08-09 17:19:14,564 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.935e-01 2024-08-09 17:19:22,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=110600.0, ans=0.125 2024-08-09 17:19:40,335 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 17:19:48,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=110800.0, ans=0.125 2024-08-09 17:19:49,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=110800.0, ans=0.0 2024-08-09 17:20:03,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.08 vs. limit=22.5 2024-08-09 17:20:10,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+01 3.030e+01 3.567e+01 4.272e+01 6.137e+01, threshold=7.134e+01, percent-clipped=0.0 2024-08-09 17:20:10,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11100, loss[loss=0.1326, beats_loss=0.01222, ecapa_loss=0.0003791, whisper_loss=0.1166, over 21788.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01298, ecapa_loss=0.0004041, whisper_loss=0.1048, over 3925270.65 frames. ], batch size: 84, lr: 3.37e-02, grad_scale: 1024.0 2024-08-09 17:20:17,021 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 17:20:38,091 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 17:20:47,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=111200.0, ans=0.2 2024-08-09 17:20:58,830 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.162e+00 2024-08-09 17:21:19,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11150, loss[loss=0.1553, beats_loss=0.01177, ecapa_loss=0.000472, whisper_loss=0.1388, over 22827.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01298, ecapa_loss=0.0004022, whisper_loss=0.1052, over 3919920.68 frames. ], batch size: 91, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:21:24,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=12.0 2024-08-09 17:21:57,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=111700.0, ans=0.2 2024-08-09 17:21:58,215 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-09 17:22:06,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=111800.0, ans=0.0 2024-08-09 17:22:06,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=111800.0, ans=0.0 2024-08-09 17:22:09,285 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-09 17:22:20,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=111900.0, ans=0.125 2024-08-09 17:22:21,864 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 17:22:22,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=111900.0, ans=0.2 2024-08-09 17:22:28,499 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.940e+01 3.532e+01 4.042e+01 6.455e+01, threshold=7.065e+01, percent-clipped=0.0 2024-08-09 17:22:28,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11200, loss[loss=0.1383, beats_loss=0.01265, ecapa_loss=0.00044, whisper_loss=0.1212, over 22280.00 frames. ], tot_loss[loss=0.122, beats_loss=0.01301, ecapa_loss=0.0004032, whisper_loss=0.105, over 3901883.88 frames. ], batch size: 89, lr: 3.36e-02, grad_scale: 1024.0 2024-08-09 17:22:33,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2024-08-09 17:22:34,326 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 31 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-09 17:22:39,830 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 17:22:40,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=112000.0, ans=0.125 2024-08-09 17:22:44,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=112100.0, ans=0.125 2024-08-09 17:22:47,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2024-08-09 17:22:51,008 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 17:23:03,758 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 17:23:06,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=112200.0, ans=0.0 2024-08-09 17:23:37,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11250, loss[loss=0.0976, beats_loss=0.01526, ecapa_loss=0.0003954, whisper_loss=0.07838, over 17336.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.013, ecapa_loss=0.0004033, whisper_loss=0.1051, over 3897411.20 frames. ], batch size: 71, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:23:47,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=22.5 2024-08-09 17:23:50,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112600.0, ans=0.1 2024-08-09 17:23:50,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=112600.0, ans=0.125 2024-08-09 17:23:51,972 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 17:23:54,854 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 17:23:59,277 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-09 17:24:02,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.59 vs. limit=5.0 2024-08-09 17:24:04,596 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 17:24:06,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.34 vs. limit=22.5 2024-08-09 17:24:10,106 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 17:24:17,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=112700.0, ans=0.04949747468305833 2024-08-09 17:24:46,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=113000.0, ans=0.0 2024-08-09 17:24:47,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.986e+01 3.509e+01 4.225e+01 7.875e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-09 17:24:47,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11300, loss[loss=0.1131, beats_loss=0.01414, ecapa_loss=0.0003536, whisper_loss=0.09548, over 20742.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.013, ecapa_loss=0.000401, whisper_loss=0.1048, over 3895875.55 frames. ], batch size: 84, lr: 3.35e-02, grad_scale: 1024.0 2024-08-09 17:24:48,846 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-09 17:24:56,139 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 17:25:04,340 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 17:25:07,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.11 vs. limit=15.0 2024-08-09 17:25:09,620 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 17:25:16,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113200.0, ans=0.1 2024-08-09 17:25:37,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-08-09 17:25:42,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=113400.0, ans=0.125 2024-08-09 17:25:56,699 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11350, loss[loss=0.1186, beats_loss=0.01189, ecapa_loss=0.000426, whisper_loss=0.1025, over 14516.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01291, ecapa_loss=0.0004019, whisper_loss=0.105, over 3875045.13 frames. ], batch size: 59, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:26:19,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=113600.0, ans=0.2 2024-08-09 17:26:19,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113600.0, ans=0.1 2024-08-09 17:26:19,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-08-09 17:26:30,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2024-08-09 17:26:34,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.73 vs. limit=10.0 2024-08-09 17:27:06,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.900e+01 3.368e+01 4.036e+01 6.013e+01, threshold=6.736e+01, percent-clipped=0.0 2024-08-09 17:27:06,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11400, loss[loss=0.1379, beats_loss=0.0118, ecapa_loss=0.0004073, whisper_loss=0.1221, over 15928.00 frames. ], tot_loss[loss=0.1219, beats_loss=0.01296, ecapa_loss=0.0004016, whisper_loss=0.1049, over 3882475.75 frames. ], batch size: 57, lr: 3.34e-02, grad_scale: 1024.0 2024-08-09 17:27:08,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=114000.0, ans=0.0 2024-08-09 17:27:17,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=12.0 2024-08-09 17:27:17,647 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 17:27:20,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=114100.0, ans=0.125 2024-08-09 17:27:23,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2024-08-09 17:27:27,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-09 17:27:41,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114200.0, ans=0.1 2024-08-09 17:27:52,352 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 17:27:59,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=114300.0, ans=0.0 2024-08-09 17:28:10,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=114400.0, ans=0.125 2024-08-09 17:28:15,895 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11450, loss[loss=0.1216, beats_loss=0.01216, ecapa_loss=0.0003603, whisper_loss=0.1059, over 19238.00 frames. ], tot_loss[loss=0.1217, beats_loss=0.01292, ecapa_loss=0.0004018, whisper_loss=0.1048, over 3860563.12 frames. ], batch size: 75, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:28:30,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=114600.0, ans=0.0 2024-08-09 17:28:34,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=114600.0, ans=0.125 2024-08-09 17:28:36,260 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 17:28:46,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=114700.0, ans=0.125 2024-08-09 17:29:26,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 3.054e+01 3.515e+01 4.307e+01 8.084e+01, threshold=7.029e+01, percent-clipped=1.0 2024-08-09 17:29:27,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11500, loss[loss=0.1354, beats_loss=0.008628, ecapa_loss=0.0004605, whisper_loss=0.1221, over 17452.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01292, ecapa_loss=0.0004022, whisper_loss=0.1055, over 3874686.91 frames. ], batch size: 69, lr: 3.33e-02, grad_scale: 1024.0 2024-08-09 17:29:32,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-09 17:29:49,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2024-08-09 17:29:53,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=115100.0, ans=0.0 2024-08-09 17:29:54,694 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 17:30:10,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=115200.0, ans=0.125 2024-08-09 17:30:17,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=115300.0, ans=0.0 2024-08-09 17:30:19,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.59 vs. limit=10.0 2024-08-09 17:30:41,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11550, loss[loss=0.1084, beats_loss=0.01434, ecapa_loss=0.0002902, whisper_loss=0.09111, over 15394.00 frames. ], tot_loss[loss=0.1225, beats_loss=0.01304, ecapa_loss=0.0003996, whisper_loss=0.1055, over 3901696.90 frames. ], batch size: 55, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:30:47,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=115500.0, ans=0.09899494936611666 2024-08-09 17:30:54,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=115600.0, ans=0.0 2024-08-09 17:31:18,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=115700.0, ans=0.0 2024-08-09 17:31:30,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=115800.0, ans=0.125 2024-08-09 17:31:36,064 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 16 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-09 17:31:49,365 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 17:31:53,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.957e+01 3.409e+01 3.917e+01 8.485e+01, threshold=6.817e+01, percent-clipped=1.0 2024-08-09 17:31:53,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11600, loss[loss=0.08404, beats_loss=0.0145, ecapa_loss=0.0004059, whisper_loss=0.06548, over 19010.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01309, ecapa_loss=0.0003959, whisper_loss=0.1056, over 3940749.28 frames. ], batch size: 79, lr: 3.32e-02, grad_scale: 1024.0 2024-08-09 17:32:09,283 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 17:32:14,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=116100.0, ans=0.0 2024-08-09 17:32:16,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2024-08-09 17:32:17,719 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 17:32:18,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2024-08-09 17:32:48,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-08-09 17:32:54,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=116400.0, ans=0.125 2024-08-09 17:32:59,638 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 17:32:59,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=116400.0, ans=0.125 2024-08-09 17:33:05,681 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 42 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 17:33:07,019 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11650, loss[loss=0.1656, beats_loss=0.008841, ecapa_loss=0.0004003, whisper_loss=0.1527, over 23444.00 frames. ], tot_loss[loss=0.1223, beats_loss=0.01311, ecapa_loss=0.0003938, whisper_loss=0.1053, over 3967000.06 frames. ], batch size: 90, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:33:17,397 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 17:33:20,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=116600.0, ans=0.125 2024-08-09 17:33:22,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=116600.0, ans=0.125 2024-08-09 17:33:31,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.02 vs. limit=15.0 2024-08-09 17:33:42,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=116700.0, ans=0.0 2024-08-09 17:34:16,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=116900.0, ans=0.0 2024-08-09 17:34:18,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.244e+01 3.106e+01 3.561e+01 4.217e+01 8.775e+01, threshold=7.122e+01, percent-clipped=2.0 2024-08-09 17:34:18,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11700, loss[loss=0.1529, beats_loss=0.0113, ecapa_loss=0.0003973, whisper_loss=0.1376, over 23391.00 frames. ], tot_loss[loss=0.1229, beats_loss=0.01299, ecapa_loss=0.0003922, whisper_loss=0.106, over 3934807.50 frames. ], batch size: 91, lr: 3.31e-02, grad_scale: 1024.0 2024-08-09 17:34:19,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-08-09 17:34:26,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=117000.0, ans=0.125 2024-08-09 17:34:36,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2024-08-09 17:34:47,797 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-09 17:34:57,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-08-09 17:35:04,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=117300.0, ans=0.09899494936611666 2024-08-09 17:35:25,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=117400.0, ans=0.0 2024-08-09 17:35:30,740 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11750, loss[loss=0.1171, beats_loss=0.01441, ecapa_loss=0.0004151, whisper_loss=0.09857, over 21681.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01308, ecapa_loss=0.0003917, whisper_loss=0.1052, over 3924622.48 frames. ], batch size: 89, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:35:31,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2024-08-09 17:35:32,242 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 17:35:46,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=117600.0, ans=0.2 2024-08-09 17:35:47,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.94 vs. limit=15.0 2024-08-09 17:35:54,504 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 17:36:02,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-08-09 17:36:09,984 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 13 from Vox, 52 fro AS 2024-08-09 17:36:11,669 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 17:36:11,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=117800.0, ans=0.125 2024-08-09 17:36:19,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=117800.0, ans=0.0 2024-08-09 17:36:25,646 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 17:36:35,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=117900.0, ans=10.0 2024-08-09 17:36:40,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.943e+01 3.344e+01 4.022e+01 9.659e+01, threshold=6.689e+01, percent-clipped=1.0 2024-08-09 17:36:40,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11800, loss[loss=0.1107, beats_loss=0.01132, ecapa_loss=0.0004128, whisper_loss=0.09529, over 18919.00 frames. ], tot_loss[loss=0.1221, beats_loss=0.01307, ecapa_loss=0.0003936, whisper_loss=0.1051, over 3900186.64 frames. ], batch size: 78, lr: 3.30e-02, grad_scale: 1024.0 2024-08-09 17:36:41,881 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-09 17:36:48,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=118000.0, ans=0.125 2024-08-09 17:36:48,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.91 vs. limit=10.0 2024-08-09 17:37:01,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=118100.0, ans=0.07 2024-08-09 17:37:02,520 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 17:37:09,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2024-08-09 17:37:11,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=118200.0, ans=0.125 2024-08-09 17:37:13,926 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-09 17:37:15,488 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-09 17:37:32,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2024-08-09 17:37:39,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=118400.0, ans=0.2 2024-08-09 17:37:51,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11850, loss[loss=0.1119, beats_loss=0.01346, ecapa_loss=0.0004137, whisper_loss=0.09431, over 20312.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01316, ecapa_loss=0.0003928, whisper_loss=0.1047, over 3922593.82 frames. ], batch size: 84, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:37:57,790 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 17:38:00,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=118500.0, ans=0.125 2024-08-09 17:38:11,311 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.413e-02 2024-08-09 17:38:32,556 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 17:39:03,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.477e+01 2.937e+01 3.452e+01 4.190e+01 6.711e+01, threshold=6.904e+01, percent-clipped=1.0 2024-08-09 17:39:03,683 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11900, loss[loss=0.1261, beats_loss=0.01158, ecapa_loss=0.0003996, whisper_loss=0.1105, over 22445.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01318, ecapa_loss=0.0003907, whisper_loss=0.1051, over 3932593.20 frames. ], batch size: 92, lr: 3.29e-02, grad_scale: 1024.0 2024-08-09 17:39:17,308 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-09 17:39:21,453 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 17:39:27,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-09 17:39:51,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=119300.0, ans=10.0 2024-08-09 17:39:57,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-09 17:39:58,642 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-09 17:40:07,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2024-08-09 17:40:12,385 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-09 17:40:14,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=119400.0, ans=0.2 2024-08-09 17:40:17,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 11950, loss[loss=0.1045, beats_loss=0.01419, ecapa_loss=0.000392, whisper_loss=0.08637, over 18706.00 frames. ], tot_loss[loss=0.1227, beats_loss=0.01304, ecapa_loss=0.0003924, whisper_loss=0.1058, over 3921464.50 frames. ], batch size: 77, lr: 3.28e-02, grad_scale: 1024.0 2024-08-09 17:40:32,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119600.0, ans=0.1 2024-08-09 17:40:37,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=119600.0, ans=0.0 2024-08-09 17:40:41,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=119600.0, ans=0.125 2024-08-09 17:40:41,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-08-09 17:40:48,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=15.0 2024-08-09 17:41:11,062 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-09 17:41:12,376 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 17:41:32,937 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-12000.pt 2024-08-09 17:41:36,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2024-08-09 17:41:36,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.930e+01 3.462e+01 4.384e+01 7.473e+01, threshold=6.925e+01, percent-clipped=1.0 2024-08-09 17:41:36,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12000, loss[loss=0.113, beats_loss=0.01337, ecapa_loss=0.000401, whisper_loss=0.09564, over 19561.00 frames. ], tot_loss[loss=0.1222, beats_loss=0.01299, ecapa_loss=0.000393, whisper_loss=0.1053, over 3900875.17 frames. ], batch size: 79, lr: 3.28e-02, grad_scale: 2048.0 2024-08-09 17:41:36,942 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 17:42:24,864 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on ASR_libri: loss=0.2866, beats_loss=0, ecapa_loss=0.00111, whisper_loss=0.2755, over 922467.00 frames. 2024-08-09 17:42:44,961 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on SV_voxceleb1: loss=0.01049, beats_loss=0, ecapa_loss=0.001049, whisper_loss=0, over 939242.00 frames. 2024-08-09 17:43:20,528 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8934, 2.8337, 3.2369, 2.3790], device='cuda:0') 2024-08-09 17:44:38,292 INFO [train_multi_KD3.py:1149] (0/4) Epoch 1, validation on AT_audioset: loss=0.03131, beats_loss=0.03131, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 17:44:38,296 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 17:44:44,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=120000.0, ans=0.2 2024-08-09 17:44:49,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=120000.0, ans=0.2 2024-08-09 17:44:51,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=120000.0, ans=0.015 2024-08-09 17:44:53,657 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-09 17:44:59,308 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 17:45:01,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=120100.0, ans=0.125 2024-08-09 17:45:02,759 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 17:45:06,965 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 17:45:07,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-09 17:45:24,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=120300.0, ans=0.125 2024-08-09 17:45:30,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-09 17:45:53,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=120400.0, ans=0.125 2024-08-09 17:45:57,311 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12050, loss[loss=0.09767, beats_loss=0.01676, ecapa_loss=0.0003497, whisper_loss=0.07741, over 23110.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01296, ecapa_loss=0.000391, whisper_loss=0.1049, over 3867696.45 frames. ], batch size: 96, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:46:18,895 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-09 17:46:41,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=120800.0, ans=0.1 2024-08-09 17:46:52,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=120800.0, ans=0.05 2024-08-09 17:46:57,791 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 17:47:12,197 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.990e+01 3.554e+01 4.139e+01 7.218e+01, threshold=7.107e+01, percent-clipped=1.0 2024-08-09 17:47:12,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12100, loss[loss=0.1298, beats_loss=0.009805, ecapa_loss=0.0004227, whisper_loss=0.1157, over 16484.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01293, ecapa_loss=0.0003909, whisper_loss=0.1044, over 3848608.31 frames. ], batch size: 62, lr: 3.27e-02, grad_scale: 2048.0 2024-08-09 17:47:21,840 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-09 17:47:23,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=121000.0, ans=0.0 2024-08-09 17:47:47,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2024-08-09 17:48:05,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.71 vs. limit=10.0 2024-08-09 17:48:10,188 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 17:48:21,692 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 17:48:29,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12150, loss[loss=0.1411, beats_loss=0.009537, ecapa_loss=0.0004316, whisper_loss=0.1273, over 20277.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01299, ecapa_loss=0.0003898, whisper_loss=0.104, over 3841891.73 frames. ], batch size: 79, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:48:46,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=121600.0, ans=0.125 2024-08-09 17:49:15,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=121800.0, ans=0.0 2024-08-09 17:49:21,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=121800.0, ans=0.2 2024-08-09 17:49:45,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.869e+01 3.277e+01 4.136e+01 6.270e+01, threshold=6.555e+01, percent-clipped=0.0 2024-08-09 17:49:46,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12200, loss[loss=0.1226, beats_loss=0.0128, ecapa_loss=0.0003658, whisper_loss=0.1061, over 20074.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01306, ecapa_loss=0.0003904, whisper_loss=0.1038, over 3848785.37 frames. ], batch size: 79, lr: 3.26e-02, grad_scale: 2048.0 2024-08-09 17:49:51,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.74 vs. limit=22.5 2024-08-09 17:50:28,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=122200.0, ans=0.125 2024-08-09 17:50:37,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.83 vs. limit=15.0 2024-08-09 17:50:57,714 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-09 17:51:01,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12250, loss[loss=0.1108, beats_loss=0.01319, ecapa_loss=0.000305, whisper_loss=0.09452, over 15814.00 frames. ], tot_loss[loss=0.1211, beats_loss=0.01306, ecapa_loss=0.0003875, whisper_loss=0.1042, over 3886944.08 frames. ], batch size: 61, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:51:02,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=122500.0, ans=0.125 2024-08-09 17:51:20,352 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 17:51:42,072 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 17:51:47,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.37 vs. limit=15.0 2024-08-09 17:51:49,800 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 17:52:01,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=122900.0, ans=0.125 2024-08-09 17:52:17,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.887e+01 3.272e+01 4.030e+01 7.099e+01, threshold=6.544e+01, percent-clipped=1.0 2024-08-09 17:52:17,153 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12300, loss[loss=0.09431, beats_loss=0.01453, ecapa_loss=0.0003431, whisper_loss=0.07634, over 14052.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01309, ecapa_loss=0.0003901, whisper_loss=0.1037, over 3897637.56 frames. ], batch size: 55, lr: 3.25e-02, grad_scale: 2048.0 2024-08-09 17:52:17,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=123000.0, ans=0.5 2024-08-09 17:52:19,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=123000.0, ans=0.125 2024-08-09 17:52:26,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-09 17:52:43,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123100.0, ans=0.1 2024-08-09 17:52:46,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123200.0, ans=0.1 2024-08-09 17:53:00,088 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-09 17:53:21,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=123400.0, ans=0.0 2024-08-09 17:53:31,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12350, loss[loss=0.1192, beats_loss=0.009311, ecapa_loss=0.0004831, whisper_loss=0.105, over 21795.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01308, ecapa_loss=0.0003908, whisper_loss=0.1028, over 3890158.92 frames. ], batch size: 95, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:53:41,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=123500.0, ans=0.2 2024-08-09 17:53:46,055 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 17:53:59,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=123600.0, ans=0.1 2024-08-09 17:53:59,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=123600.0, ans=0.125 2024-08-09 17:54:05,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=123700.0, ans=0.1 2024-08-09 17:54:07,466 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 17:54:20,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=123800.0, ans=0.125 2024-08-09 17:54:37,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123900.0, ans=0.1 2024-08-09 17:54:38,207 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-09 17:54:43,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=123900.0, ans=0.1 2024-08-09 17:54:48,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.013e+01 3.404e+01 4.023e+01 7.879e+01, threshold=6.808e+01, percent-clipped=3.0 2024-08-09 17:54:48,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12400, loss[loss=0.09939, beats_loss=0.0116, ecapa_loss=0.000304, whisper_loss=0.08475, over 19747.00 frames. ], tot_loss[loss=0.1207, beats_loss=0.01299, ecapa_loss=0.0003894, whisper_loss=0.1038, over 3888393.79 frames. ], batch size: 73, lr: 3.24e-02, grad_scale: 2048.0 2024-08-09 17:54:52,803 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-09 17:55:19,636 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 17:55:25,508 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 17:55:25,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=124200.0, ans=0.0 2024-08-09 17:55:32,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=124300.0, ans=0.0 2024-08-09 17:55:33,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=124300.0, ans=0.05 2024-08-09 17:55:42,984 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-09 17:55:51,519 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 17:55:54,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=124400.0, ans=0.125 2024-08-09 17:56:00,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12450, loss[loss=0.1146, beats_loss=0.01651, ecapa_loss=0.0003745, whisper_loss=0.09432, over 18926.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01303, ecapa_loss=0.0003881, whisper_loss=0.1037, over 3882830.43 frames. ], batch size: 78, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:56:03,381 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 17:56:03,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=124500.0, ans=0.125 2024-08-09 17:56:24,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=124600.0, ans=0.015 2024-08-09 17:56:34,483 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 17:56:41,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=124700.0, ans=0.09899494936611666 2024-08-09 17:56:44,021 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 17:56:51,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-08-09 17:56:53,519 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 17:57:14,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 2.994e+01 3.498e+01 4.030e+01 6.153e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-09 17:57:14,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12500, loss[loss=0.1412, beats_loss=0.0103, ecapa_loss=0.0003944, whisper_loss=0.1269, over 20748.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01297, ecapa_loss=0.0003877, whisper_loss=0.1044, over 3892490.06 frames. ], batch size: 81, lr: 3.23e-02, grad_scale: 2048.0 2024-08-09 17:57:14,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=125000.0, ans=0.125 2024-08-09 17:57:24,444 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-09 17:57:28,902 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-09 17:57:30,409 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 17:57:48,431 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 17:58:09,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=125300.0, ans=0.025 2024-08-09 17:58:09,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=125300.0, ans=0.125 2024-08-09 17:58:17,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=125400.0, ans=0.125 2024-08-09 17:58:20,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125400.0, ans=0.1 2024-08-09 17:58:23,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=125400.0, ans=0.125 2024-08-09 17:58:28,860 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12550, loss[loss=0.1174, beats_loss=0.01209, ecapa_loss=0.0003773, whisper_loss=0.1016, over 17749.00 frames. ], tot_loss[loss=0.1209, beats_loss=0.01307, ecapa_loss=0.0003892, whisper_loss=0.104, over 3914251.87 frames. ], batch size: 70, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:58:37,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=125500.0, ans=0.0 2024-08-09 17:58:37,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=125500.0, ans=0.09899494936611666 2024-08-09 17:58:45,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=125600.0, ans=0.125 2024-08-09 17:58:45,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.58 vs. limit=15.0 2024-08-09 17:58:49,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125600.0, ans=0.1 2024-08-09 17:59:02,739 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-09 17:59:03,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=125700.0, ans=0.125 2024-08-09 17:59:10,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=125700.0, ans=0.125 2024-08-09 17:59:43,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.067e+01 3.520e+01 4.433e+01 6.633e+01, threshold=7.039e+01, percent-clipped=0.0 2024-08-09 17:59:43,282 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12600, loss[loss=0.1326, beats_loss=0.01361, ecapa_loss=0.0004351, whisper_loss=0.1146, over 21898.00 frames. ], tot_loss[loss=0.1212, beats_loss=0.01294, ecapa_loss=0.0003894, whisper_loss=0.1043, over 3914794.45 frames. ], batch size: 91, lr: 3.22e-02, grad_scale: 2048.0 2024-08-09 17:59:55,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2024-08-09 18:00:11,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=126200.0, ans=0.125 2024-08-09 18:00:14,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-08-09 18:00:42,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.49 vs. limit=22.5 2024-08-09 18:00:55,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12650, loss[loss=0.1272, beats_loss=0.01188, ecapa_loss=0.0004018, whisper_loss=0.1113, over 23269.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01302, ecapa_loss=0.0003877, whisper_loss=0.1036, over 3860435.78 frames. ], batch size: 93, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:00:55,802 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-09 18:01:01,240 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 18:01:08,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=126600.0, ans=0.125 2024-08-09 18:01:15,551 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-09 18:01:38,067 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 18:01:51,334 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 18:02:08,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.930e+01 3.194e+01 3.853e+01 8.153e+01, threshold=6.388e+01, percent-clipped=1.0 2024-08-09 18:02:08,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12700, loss[loss=0.1194, beats_loss=0.01467, ecapa_loss=0.0004202, whisper_loss=0.1006, over 22503.00 frames. ], tot_loss[loss=0.1213, beats_loss=0.01297, ecapa_loss=0.0003881, whisper_loss=0.1044, over 3849930.54 frames. ], batch size: 93, lr: 3.21e-02, grad_scale: 2048.0 2024-08-09 18:02:11,567 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-09 18:02:17,158 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-09 18:02:26,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=127100.0, ans=0.0 2024-08-09 18:02:41,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=127200.0, ans=0.125 2024-08-09 18:02:47,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.93 vs. limit=15.0 2024-08-09 18:03:12,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=127400.0, ans=0.125 2024-08-09 18:03:20,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=127400.0, ans=0.0 2024-08-09 18:03:22,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12750, loss[loss=0.1294, beats_loss=0.01099, ecapa_loss=0.0004742, whisper_loss=0.1137, over 21355.00 frames. ], tot_loss[loss=0.1218, beats_loss=0.01296, ecapa_loss=0.000387, whisper_loss=0.105, over 3896277.47 frames. ], batch size: 91, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:03:23,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.93 vs. limit=5.0 2024-08-09 18:03:34,947 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 18:03:52,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=127700.0, ans=0.5 2024-08-09 18:03:53,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=127700.0, ans=0.0 2024-08-09 18:03:57,775 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 18:04:07,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=127800.0, ans=0.05 2024-08-09 18:04:20,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=127900.0, ans=0.0 2024-08-09 18:04:29,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=127900.0, ans=0.125 2024-08-09 18:04:33,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.049e+01 3.510e+01 3.985e+01 5.812e+01, threshold=7.020e+01, percent-clipped=0.0 2024-08-09 18:04:33,378 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12800, loss[loss=0.1099, beats_loss=0.01256, ecapa_loss=0.0004532, whisper_loss=0.0928, over 15528.00 frames. ], tot_loss[loss=0.1215, beats_loss=0.01291, ecapa_loss=0.0003906, whisper_loss=0.1047, over 3896131.52 frames. ], batch size: 66, lr: 3.20e-02, grad_scale: 2048.0 2024-08-09 18:04:36,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=128000.0, ans=0.0 2024-08-09 18:04:45,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=128100.0, ans=0.125 2024-08-09 18:04:47,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=128100.0, ans=0.0 2024-08-09 18:04:50,044 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 18:05:00,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=128200.0, ans=0.125 2024-08-09 18:05:12,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=128200.0, ans=0.05 2024-08-09 18:05:22,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=128300.0, ans=0.125 2024-08-09 18:05:27,925 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-09 18:05:31,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=128400.0, ans=0.1 2024-08-09 18:05:40,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=128400.0, ans=0.0 2024-08-09 18:05:41,719 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 28 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-09 18:05:44,022 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12850, loss[loss=0.06946, beats_loss=0.01657, ecapa_loss=0.0003691, whisper_loss=0.0492, over 13758.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01301, ecapa_loss=0.00039, whisper_loss=0.1035, over 3877714.65 frames. ], batch size: 57, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:05:53,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=128500.0, ans=0.0 2024-08-09 18:06:20,956 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 18:06:27,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=128800.0, ans=0.2 2024-08-09 18:06:28,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=128800.0, ans=0.125 2024-08-09 18:06:32,787 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-09 18:06:49,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=128900.0, ans=0.125 2024-08-09 18:06:57,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.723e+01 3.295e+01 4.012e+01 6.106e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-09 18:06:57,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12900, loss[loss=0.1194, beats_loss=0.01263, ecapa_loss=0.0004457, whisper_loss=0.1023, over 20280.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01293, ecapa_loss=0.00039, whisper_loss=0.1035, over 3875212.73 frames. ], batch size: 85, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:07:03,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=129000.0, ans=0.0 2024-08-09 18:07:15,895 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 18:07:45,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=129300.0, ans=0.0 2024-08-09 18:07:57,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129400.0, ans=0.1 2024-08-09 18:08:08,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 12950, loss[loss=0.08316, beats_loss=0.01585, ecapa_loss=0.0004405, whisper_loss=0.06291, over 19414.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01291, ecapa_loss=0.0003883, whisper_loss=0.1033, over 3892100.90 frames. ], batch size: 86, lr: 3.19e-02, grad_scale: 2048.0 2024-08-09 18:08:49,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=129700.0, ans=0.2 2024-08-09 18:08:50,688 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 18:08:55,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=129800.0, ans=0.0 2024-08-09 18:08:58,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=129800.0, ans=0.125 2024-08-09 18:09:08,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=129900.0, ans=0.125 2024-08-09 18:09:24,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 3.005e+01 3.464e+01 3.958e+01 5.866e+01, threshold=6.929e+01, percent-clipped=0.0 2024-08-09 18:09:24,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13000, loss[loss=0.08872, beats_loss=0.01454, ecapa_loss=0.000393, whisper_loss=0.07025, over 13173.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01287, ecapa_loss=0.0003885, whisper_loss=0.1038, over 3898567.07 frames. ], batch size: 55, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:09:27,455 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 18:09:29,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=130000.0, ans=0.125 2024-08-09 18:09:40,486 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-09 18:09:43,402 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 18:09:44,764 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-09 18:09:46,209 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 18:09:50,226 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 18:09:52,995 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 18:09:56,239 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 18:10:12,609 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 18:10:14,160 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-09 18:10:15,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=130300.0, ans=0.125 2024-08-09 18:10:17,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=130300.0, ans=0.125 2024-08-09 18:10:24,757 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 18:10:30,044 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 18:10:38,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13050, loss[loss=0.1199, beats_loss=0.01475, ecapa_loss=0.000363, whisper_loss=0.1015, over 15516.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.0129, ecapa_loss=0.0003861, whisper_loss=0.1037, over 3912519.66 frames. ], batch size: 63, lr: 3.18e-02, grad_scale: 2048.0 2024-08-09 18:10:48,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=12.0 2024-08-09 18:10:51,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=130500.0, ans=0.125 2024-08-09 18:10:55,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=130600.0, ans=0.0 2024-08-09 18:10:59,117 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 26 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-09 18:11:02,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=130600.0, ans=0.2 2024-08-09 18:11:02,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=130600.0, ans=0.125 2024-08-09 18:11:02,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=130600.0, ans=0.0 2024-08-09 18:11:16,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=130700.0, ans=0.0 2024-08-09 18:11:29,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=130800.0, ans=0.2 2024-08-09 18:11:45,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=130800.0, ans=0.125 2024-08-09 18:11:51,136 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-09 18:11:57,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2024-08-09 18:12:00,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-08-09 18:12:05,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=130900.0, ans=0.0 2024-08-09 18:12:08,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.885e+01 3.590e+01 4.189e+01 8.103e+01, threshold=7.179e+01, percent-clipped=1.0 2024-08-09 18:12:08,468 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13100, loss[loss=0.1171, beats_loss=0.01157, ecapa_loss=0.0003091, whisper_loss=0.1024, over 15353.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01288, ecapa_loss=0.0003847, whisper_loss=0.1036, over 3859255.33 frames. ], batch size: 59, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:12:33,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-09 18:13:06,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.01 vs. limit=10.0 2024-08-09 18:13:41,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13150, loss[loss=0.121, beats_loss=0.0145, ecapa_loss=0.0003345, whisper_loss=0.1031, over 19478.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01299, ecapa_loss=0.0003829, whisper_loss=0.1035, over 3857127.32 frames. ], batch size: 78, lr: 3.17e-02, grad_scale: 2048.0 2024-08-09 18:13:41,324 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-09 18:13:41,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=131500.0, ans=0.0 2024-08-09 18:13:59,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131500.0, ans=0.1 2024-08-09 18:14:06,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=131600.0, ans=0.125 2024-08-09 18:14:18,650 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 18:14:19,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2024-08-09 18:14:29,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=131700.0, ans=0.2 2024-08-09 18:14:32,161 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 18:14:39,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=131700.0, ans=0.2 2024-08-09 18:14:41,147 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 18:14:57,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131800.0, ans=0.1 2024-08-09 18:15:01,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=131800.0, ans=0.0 2024-08-09 18:15:03,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=131800.0, ans=0.125 2024-08-09 18:15:26,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=131900.0, ans=0.07 2024-08-09 18:15:26,965 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=9.843e-02 2024-08-09 18:15:31,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.963e+01 3.357e+01 4.080e+01 6.559e+01, threshold=6.714e+01, percent-clipped=0.0 2024-08-09 18:15:31,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13200, loss[loss=0.113, beats_loss=0.01164, ecapa_loss=0.0003785, whisper_loss=0.09759, over 20407.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.013, ecapa_loss=0.0003819, whisper_loss=0.1035, over 3828627.24 frames. ], batch size: 81, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:15:31,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=132000.0, ans=0.125 2024-08-09 18:15:32,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=132000.0, ans=0.0 2024-08-09 18:15:48,160 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-09 18:17:06,679 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-09 18:17:11,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=132400.0, ans=0.2 2024-08-09 18:17:16,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13250, loss[loss=0.1268, beats_loss=0.01388, ecapa_loss=0.0003783, whisper_loss=0.1091, over 18023.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01298, ecapa_loss=0.0003823, whisper_loss=0.1026, over 3830473.56 frames. ], batch size: 75, lr: 3.16e-02, grad_scale: 2048.0 2024-08-09 18:17:22,224 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 18:17:28,198 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 18:17:34,573 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-09 18:17:57,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-08-09 18:18:00,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2024-08-09 18:18:06,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=132700.0, ans=0.0 2024-08-09 18:18:07,069 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 18:18:10,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=132800.0, ans=0.0 2024-08-09 18:18:15,614 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 18:18:27,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=132900.0, ans=0.0 2024-08-09 18:18:31,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=132900.0, ans=0.2 2024-08-09 18:18:36,275 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-09 18:18:37,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=132900.0, ans=0.0 2024-08-09 18:18:39,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2024-08-09 18:18:40,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 2.970e+01 3.375e+01 4.348e+01 9.574e+01, threshold=6.749e+01, percent-clipped=3.0 2024-08-09 18:18:40,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13300, loss[loss=0.108, beats_loss=0.01729, ecapa_loss=0.0002478, whisper_loss=0.08822, over 24404.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01301, ecapa_loss=0.0003777, whisper_loss=0.1022, over 3861153.90 frames. ], batch size: 94, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:18:42,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=133000.0, ans=0.035 2024-08-09 18:19:03,062 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.438e+00 2024-08-09 18:19:18,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=133200.0, ans=0.035 2024-08-09 18:19:18,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=133200.0, ans=0.0 2024-08-09 18:19:22,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=133300.0, ans=0.0 2024-08-09 18:19:23,696 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 18:19:50,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13350, loss[loss=0.1035, beats_loss=0.01541, ecapa_loss=0.0003322, whisper_loss=0.08474, over 17028.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.013, ecapa_loss=0.0003769, whisper_loss=0.1023, over 3868725.42 frames. ], batch size: 68, lr: 3.15e-02, grad_scale: 2048.0 2024-08-09 18:20:06,467 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 18:20:09,659 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 18:20:13,745 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 18:20:31,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133700.0, ans=0.1 2024-08-09 18:20:33,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=133800.0, ans=0.0 2024-08-09 18:20:58,899 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-09 18:21:03,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 3.029e+01 3.343e+01 3.897e+01 6.977e+01, threshold=6.687e+01, percent-clipped=1.0 2024-08-09 18:21:03,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13400, loss[loss=0.1198, beats_loss=0.01317, ecapa_loss=0.0004083, whisper_loss=0.1026, over 21390.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01301, ecapa_loss=0.0003779, whisper_loss=0.102, over 3844521.78 frames. ], batch size: 88, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:21:15,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=134000.0, ans=0.07 2024-08-09 18:21:19,247 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-09 18:21:19,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.22 vs. limit=22.5 2024-08-09 18:21:36,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2024-08-09 18:21:37,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-08-09 18:21:39,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=134200.0, ans=0.95 2024-08-09 18:21:49,818 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 18:21:55,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=134300.0, ans=0.125 2024-08-09 18:21:55,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=134300.0, ans=0.0 2024-08-09 18:21:57,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2024-08-09 18:22:06,585 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 18:22:13,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13450, loss[loss=0.1306, beats_loss=0.0152, ecapa_loss=0.0002895, whisper_loss=0.1125, over 20252.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01305, ecapa_loss=0.0003777, whisper_loss=0.1022, over 3846198.12 frames. ], batch size: 77, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:22:28,272 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.009e-01 2024-08-09 18:22:33,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=134600.0, ans=0.07 2024-08-09 18:22:37,156 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 18:22:39,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.70 vs. limit=22.5 2024-08-09 18:22:58,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=134800.0, ans=0.125 2024-08-09 18:22:59,555 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-09 18:22:59,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=134800.0, ans=0.125 2024-08-09 18:23:15,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-08-09 18:23:23,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.856e+01 3.489e+01 4.024e+01 6.380e+01, threshold=6.978e+01, percent-clipped=0.0 2024-08-09 18:23:23,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13500, loss[loss=0.137, beats_loss=0.01384, ecapa_loss=0.0003279, whisper_loss=0.1199, over 23187.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01305, ecapa_loss=0.0003795, whisper_loss=0.1026, over 3828518.32 frames. ], batch size: 92, lr: 3.14e-02, grad_scale: 2048.0 2024-08-09 18:23:29,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135000.0, ans=0.1 2024-08-09 18:23:44,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=135100.0, ans=0.2 2024-08-09 18:23:45,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=135100.0, ans=0.0 2024-08-09 18:23:51,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=135200.0, ans=0.125 2024-08-09 18:23:51,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=135200.0, ans=0.2 2024-08-09 18:23:54,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=135200.0, ans=0.0 2024-08-09 18:24:23,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2024-08-09 18:24:34,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13550, loss[loss=0.1252, beats_loss=0.01433, ecapa_loss=0.0003223, whisper_loss=0.1076, over 19932.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01307, ecapa_loss=0.0003777, whisper_loss=0.1028, over 3855040.80 frames. ], batch size: 79, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:24:40,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=135500.0, ans=0.2 2024-08-09 18:24:41,791 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-09 18:24:49,353 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 18:25:03,436 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-09 18:25:10,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=135700.0, ans=0.125 2024-08-09 18:25:25,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=135800.0, ans=0.125 2024-08-09 18:25:38,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=135900.0, ans=0.125 2024-08-09 18:25:40,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=135900.0, ans=0.2 2024-08-09 18:25:47,054 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 3.070e+01 3.576e+01 4.104e+01 5.875e+01, threshold=7.153e+01, percent-clipped=0.0 2024-08-09 18:25:47,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13600, loss[loss=0.1188, beats_loss=0.01179, ecapa_loss=0.0003486, whisper_loss=0.1035, over 17027.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.0131, ecapa_loss=0.000375, whisper_loss=0.1025, over 3849668.36 frames. ], batch size: 64, lr: 3.13e-02, grad_scale: 2048.0 2024-08-09 18:25:56,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=136000.0, ans=0.125 2024-08-09 18:25:56,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-09 18:26:01,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=136100.0, ans=0.035 2024-08-09 18:26:09,418 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-09 18:26:13,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=136100.0, ans=0.125 2024-08-09 18:26:26,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136200.0, ans=0.1 2024-08-09 18:26:28,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=15.0 2024-08-09 18:26:42,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=136300.0, ans=0.0 2024-08-09 18:26:50,523 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 18:26:58,629 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13650, loss[loss=0.1371, beats_loss=0.01201, ecapa_loss=0.0003341, whisper_loss=0.1218, over 23911.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01322, ecapa_loss=0.000376, whisper_loss=0.1021, over 3862760.30 frames. ], batch size: 89, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:27:17,297 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 18:27:22,827 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 18:27:27,643 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 18:27:33,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136700.0, ans=0.1 2024-08-09 18:27:43,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=136800.0, ans=0.125 2024-08-09 18:27:48,843 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 18:27:54,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=136900.0, ans=0.2 2024-08-09 18:27:54,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=136900.0, ans=0.125 2024-08-09 18:28:01,520 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 18:28:05,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2024-08-09 18:28:07,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=136900.0, ans=0.125 2024-08-09 18:28:09,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.873e+01 3.233e+01 3.836e+01 5.786e+01, threshold=6.466e+01, percent-clipped=0.0 2024-08-09 18:28:09,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13700, loss[loss=0.1052, beats_loss=0.01368, ecapa_loss=0.0004489, whisper_loss=0.08699, over 20697.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01322, ecapa_loss=0.0003763, whisper_loss=0.1017, over 3869511.44 frames. ], batch size: 88, lr: 3.12e-02, grad_scale: 2048.0 2024-08-09 18:28:26,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=137100.0, ans=0.0 2024-08-09 18:28:27,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=137100.0, ans=0.125 2024-08-09 18:28:37,213 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 18:29:05,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=137400.0, ans=0.05 2024-08-09 18:29:10,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=137400.0, ans=0.125 2024-08-09 18:29:13,606 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 18:29:19,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=137500.0, ans=0.5 2024-08-09 18:29:20,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13750, loss[loss=0.1334, beats_loss=0.0111, ecapa_loss=0.0003338, whisper_loss=0.1189, over 19732.00 frames. ], tot_loss[loss=0.1194, beats_loss=0.01306, ecapa_loss=0.0003777, whisper_loss=0.1025, over 3891496.25 frames. ], batch size: 72, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:29:22,087 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.353e+00 2024-08-09 18:29:24,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137500.0, ans=0.1 2024-08-09 18:29:34,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137600.0, ans=0.1 2024-08-09 18:29:39,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=137600.0, ans=0.0 2024-08-09 18:29:44,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=137600.0, ans=0.2 2024-08-09 18:29:49,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-09 18:30:01,957 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 18:30:03,199 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-09 18:30:03,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=137800.0, ans=0.125 2024-08-09 18:30:03,600 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.317e-02 2024-08-09 18:30:05,752 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 18:30:11,283 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-09 18:30:22,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=12.0 2024-08-09 18:30:28,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+01 2.986e+01 3.490e+01 4.118e+01 8.159e+01, threshold=6.980e+01, percent-clipped=6.0 2024-08-09 18:30:28,802 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13800, loss[loss=0.1124, beats_loss=0.01287, ecapa_loss=0.0003373, whisper_loss=0.09613, over 18693.00 frames. ], tot_loss[loss=0.1197, beats_loss=0.01298, ecapa_loss=0.000378, whisper_loss=0.1029, over 3878600.59 frames. ], batch size: 75, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:30:32,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.18 vs. limit=22.5 2024-08-09 18:30:47,234 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 33 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 18:31:34,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138400.0, ans=0.1 2024-08-09 18:31:37,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13850, loss[loss=0.1145, beats_loss=0.0111, ecapa_loss=0.0004401, whisper_loss=0.09895, over 18895.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01286, ecapa_loss=0.0003786, whisper_loss=0.1034, over 3880505.00 frames. ], batch size: 72, lr: 3.11e-02, grad_scale: 2048.0 2024-08-09 18:32:02,249 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 18:32:14,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.50 vs. limit=22.5 2024-08-09 18:32:23,757 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 18:32:40,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2024-08-09 18:32:49,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.816e+01 3.337e+01 3.813e+01 6.629e+01, threshold=6.673e+01, percent-clipped=0.0 2024-08-09 18:32:49,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13900, loss[loss=0.1329, beats_loss=0.01234, ecapa_loss=0.0003441, whisper_loss=0.1171, over 23100.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.0129, ecapa_loss=0.0003765, whisper_loss=0.1037, over 3907592.37 frames. ], batch size: 91, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:32:55,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-08-09 18:32:58,245 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 18:33:30,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=139300.0, ans=0.125 2024-08-09 18:33:59,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-08-09 18:34:00,061 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 13950, loss[loss=0.1097, beats_loss=0.01438, ecapa_loss=0.0003252, whisper_loss=0.09207, over 21705.00 frames. ], tot_loss[loss=0.1204, beats_loss=0.01294, ecapa_loss=0.0003745, whisper_loss=0.1038, over 3912857.10 frames. ], batch size: 88, lr: 3.10e-02, grad_scale: 2048.0 2024-08-09 18:34:02,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=139500.0, ans=0.5 2024-08-09 18:34:20,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=139600.0, ans=0.125 2024-08-09 18:34:35,505 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 18:34:49,707 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 26 from Vox, 17 fro AS 2024-08-09 18:35:03,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=139900.0, ans=0.07 2024-08-09 18:35:09,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 3.055e+01 3.459e+01 4.049e+01 5.260e+01, threshold=6.917e+01, percent-clipped=0.0 2024-08-09 18:35:09,096 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14000, loss[loss=0.1192, beats_loss=0.01229, ecapa_loss=0.000383, whisper_loss=0.1031, over 19716.00 frames. ], tot_loss[loss=0.1205, beats_loss=0.01292, ecapa_loss=0.0003716, whisper_loss=0.1038, over 3890401.44 frames. ], batch size: 78, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:35:10,799 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 18:35:22,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=140100.0, ans=0.0 2024-08-09 18:35:36,098 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-09 18:35:37,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=140200.0, ans=0.125 2024-08-09 18:35:40,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=140200.0, ans=0.125 2024-08-09 18:36:11,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=140400.0, ans=0.0 2024-08-09 18:36:13,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=140400.0, ans=0.0 2024-08-09 18:36:17,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=140500.0, ans=0.125 2024-08-09 18:36:18,035 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14050, loss[loss=0.1202, beats_loss=0.01595, ecapa_loss=0.0003366, whisper_loss=0.1009, over 20228.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01303, ecapa_loss=0.0003686, whisper_loss=0.1039, over 3893444.44 frames. ], batch size: 80, lr: 3.09e-02, grad_scale: 4096.0 2024-08-09 18:36:18,218 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 18:36:30,923 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-09 18:36:45,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=140700.0, ans=0.07 2024-08-09 18:36:53,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=140700.0, ans=0.125 2024-08-09 18:37:00,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140800.0, ans=0.1 2024-08-09 18:37:14,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=140900.0, ans=0.0 2024-08-09 18:37:17,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=140900.0, ans=10.0 2024-08-09 18:37:22,346 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-09 18:37:24,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=140900.0, ans=0.2 2024-08-09 18:37:27,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.062e+01 3.430e+01 4.130e+01 6.899e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 18:37:27,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14100, loss[loss=0.1283, beats_loss=0.01379, ecapa_loss=0.000357, whisper_loss=0.111, over 17989.00 frames. ], tot_loss[loss=0.1206, beats_loss=0.01306, ecapa_loss=0.0003689, whisper_loss=0.1038, over 3871095.65 frames. ], batch size: 72, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:37:40,478 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-09 18:37:45,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141100.0, ans=0.1 2024-08-09 18:37:48,615 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-09 18:37:48,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=141100.0, ans=0.0 2024-08-09 18:37:55,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.28 vs. limit=22.5 2024-08-09 18:37:59,576 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 18:38:18,082 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 18:38:19,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141300.0, ans=0.1 2024-08-09 18:38:19,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=141300.0, ans=0.0 2024-08-09 18:38:37,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14150, loss[loss=0.1222, beats_loss=0.01425, ecapa_loss=0.000375, whisper_loss=0.1042, over 15477.00 frames. ], tot_loss[loss=0.1203, beats_loss=0.01312, ecapa_loss=0.00037, whisper_loss=0.1035, over 3896818.08 frames. ], batch size: 64, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:38:40,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=141500.0, ans=0.0 2024-08-09 18:38:50,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=141600.0, ans=0.0 2024-08-09 18:38:51,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=141600.0, ans=0.125 2024-08-09 18:39:05,976 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 18:39:12,648 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 18:39:40,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=15.0 2024-08-09 18:39:46,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141900.0, ans=0.1 2024-08-09 18:39:48,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.107e+01 3.530e+01 4.182e+01 6.705e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-09 18:39:48,638 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14200, loss[loss=0.1081, beats_loss=0.01576, ecapa_loss=0.0002715, whisper_loss=0.08966, over 20879.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.0131, ecapa_loss=0.0003688, whisper_loss=0.103, over 3892874.36 frames. ], batch size: 80, lr: 3.08e-02, grad_scale: 4096.0 2024-08-09 18:39:51,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.45 vs. limit=15.0 2024-08-09 18:39:56,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-09 18:40:11,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=142100.0, ans=0.0 2024-08-09 18:40:39,869 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 18:40:41,278 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-09 18:40:52,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=142400.0, ans=0.0 2024-08-09 18:41:02,835 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 18:41:03,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=142500.0, ans=0.125 2024-08-09 18:41:03,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2024-08-09 18:41:04,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14250, loss[loss=0.1389, beats_loss=0.01164, ecapa_loss=0.0003397, whisper_loss=0.1238, over 15473.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01311, ecapa_loss=0.0003667, whisper_loss=0.1033, over 3904895.27 frames. ], batch size: 58, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:41:04,424 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-09 18:41:24,367 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 18:41:24,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-09 18:41:38,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=142700.0, ans=0.125 2024-08-09 18:41:56,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-09 18:42:19,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.991e+01 3.300e+01 4.002e+01 6.725e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-09 18:42:19,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14300, loss[loss=0.1029, beats_loss=0.01254, ecapa_loss=0.0003907, whisper_loss=0.08642, over 18115.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01314, ecapa_loss=0.0003664, whisper_loss=0.103, over 3931242.97 frames. ], batch size: 74, lr: 3.07e-02, grad_scale: 4096.0 2024-08-09 18:42:50,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=143200.0, ans=0.07 2024-08-09 18:43:02,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.12 vs. limit=15.0 2024-08-09 18:43:03,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=143300.0, ans=0.125 2024-08-09 18:43:05,543 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 18:43:10,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=143300.0, ans=0.2 2024-08-09 18:43:28,593 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-09 18:43:33,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14350, loss[loss=0.1125, beats_loss=0.01155, ecapa_loss=0.0004555, whisper_loss=0.0964, over 21893.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.01309, ecapa_loss=0.0003644, whisper_loss=0.1033, over 3919499.42 frames. ], batch size: 92, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:43:36,089 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 11 from Vox, 50 fro AS 2024-08-09 18:43:37,286 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 18:43:37,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143500.0, ans=0.1 2024-08-09 18:43:40,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=143500.0, ans=0.0 2024-08-09 18:43:58,672 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 18:44:05,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=143700.0, ans=0.0 2024-08-09 18:44:36,644 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 18:44:40,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=143900.0, ans=0.07 2024-08-09 18:44:41,415 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-09 18:44:48,780 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.978e+01 3.379e+01 3.872e+01 1.013e+02, threshold=6.758e+01, percent-clipped=3.0 2024-08-09 18:44:48,804 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14400, loss[loss=0.08583, beats_loss=0.01388, ecapa_loss=0.0003231, whisper_loss=0.06872, over 21430.00 frames. ], tot_loss[loss=0.12, beats_loss=0.0131, ecapa_loss=0.0003652, whisper_loss=0.1032, over 3946306.30 frames. ], batch size: 89, lr: 3.06e-02, grad_scale: 4096.0 2024-08-09 18:44:49,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=144000.0, ans=0.125 2024-08-09 18:45:04,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=144100.0, ans=0.125 2024-08-09 18:45:06,865 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 18:45:26,564 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 18:45:28,081 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 18:45:35,857 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 18:45:36,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=144300.0, ans=0.2 2024-08-09 18:45:36,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=144300.0, ans=12.0 2024-08-09 18:46:01,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 1, batch 14450, loss[loss=0.1372, beats_loss=0.01198, ecapa_loss=0.0003519, whisper_loss=0.1217, over 22898.00 frames. ], tot_loss[loss=0.1201, beats_loss=0.0131, ecapa_loss=0.0003669, whisper_loss=0.1033, over 3919497.35 frames. ], batch size: 90, lr: 3.05e-02, grad_scale: 4096.0 2024-08-09 18:46:03,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=144500.0, ans=0.04949747468305833 2024-08-09 18:46:03,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=144500.0, ans=0.0 2024-08-09 18:46:04,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=144500.0, ans=0.125 2024-08-09 18:46:14,749 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 18:46:23,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=144600.0, ans=0.1 2024-08-09 18:46:24,532 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 18:46:26,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-09 18:46:32,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=144700.0, ans=0.0 2024-08-09 18:46:56,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=144800.0, ans=0.0 2024-08-09 18:46:56,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.51 vs. limit=10.0 2024-08-09 18:47:01,092 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-09 18:47:02,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=144900.0, ans=0.2 2024-08-09 18:47:07,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=144900.0, ans=0.125 2024-08-09 18:47:11,929 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-1.pt 2024-08-09 18:47:50,715 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-09 18:47:51,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 0, loss[loss=0.1288, beats_loss=0.01348, ecapa_loss=0.0003456, whisper_loss=0.1118, over 23377.00 frames. ], tot_loss[loss=0.1288, beats_loss=0.01348, ecapa_loss=0.0003456, whisper_loss=0.1118, over 23377.00 frames. ], batch size: 93, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:47:51,901 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 18:48:33,872 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on ASR_libri: loss=0.287, beats_loss=0, ecapa_loss=0.001066, whisper_loss=0.2763, over 922467.00 frames. 2024-08-09 18:48:50,314 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on SV_voxceleb1: loss=0.009611, beats_loss=0, ecapa_loss=0.0009611, whisper_loss=0, over 939242.00 frames. 2024-08-09 18:50:45,501 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2225, 3.8051, 3.5530, 3.6536], device='cuda:0') 2024-08-09 18:50:53,465 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on AT_audioset: loss=0.0306, beats_loss=0.0306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 18:50:53,469 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 18:50:56,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.997e+01 3.426e+01 4.261e+01 6.161e+01, threshold=6.853e+01, percent-clipped=0.0 2024-08-09 18:51:11,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=144980.0, ans=0.125 2024-08-09 18:51:42,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=145080.0, ans=0.0 2024-08-09 18:51:53,461 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.788e-02 2024-08-09 18:52:40,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=145380.0, ans=0.07 2024-08-09 18:52:47,912 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 18:52:53,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=145380.0, ans=0.125 2024-08-09 18:52:53,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=145380.0, ans=0.2 2024-08-09 18:53:03,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 50, loss[loss=0.1282, beats_loss=0.014, ecapa_loss=0.0003913, whisper_loss=0.1102, over 18366.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01336, ecapa_loss=0.0003777, whisper_loss=0.1026, over 905489.82 frames. ], batch size: 74, lr: 2.99e-02, grad_scale: 4096.0 2024-08-09 18:53:37,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=145580.0, ans=0.125 2024-08-09 18:54:25,399 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-09 18:54:29,554 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 18:54:32,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=145780.0, ans=0.125 2024-08-09 18:54:34,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145780.0, ans=0.1 2024-08-09 18:54:49,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=145880.0, ans=0.125 2024-08-09 18:54:53,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=145880.0, ans=0.125 2024-08-09 18:54:54,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.39 vs. limit=10.0 2024-08-09 18:55:03,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 100, loss[loss=0.1086, beats_loss=0.01417, ecapa_loss=0.0003797, whisper_loss=0.09064, over 19194.00 frames. ], tot_loss[loss=0.1198, beats_loss=0.01306, ecapa_loss=0.0003711, whisper_loss=0.103, over 1506396.15 frames. ], batch size: 80, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:55:06,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=145980.0, ans=0.125 2024-08-09 18:55:07,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.227e+01 3.507e+01 4.114e+01 7.130e+01, threshold=7.014e+01, percent-clipped=1.0 2024-08-09 18:55:43,005 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 18:56:21,694 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-09 18:56:25,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=146280.0, ans=0.125 2024-08-09 18:56:40,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=146380.0, ans=0.09899494936611666 2024-08-09 18:56:53,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 150, loss[loss=0.1183, beats_loss=0.01321, ecapa_loss=0.0004119, whisper_loss=0.101, over 18065.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01303, ecapa_loss=0.0003687, whisper_loss=0.1016, over 1998512.62 frames. ], batch size: 76, lr: 2.98e-02, grad_scale: 4096.0 2024-08-09 18:56:53,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=146480.0, ans=0.2 2024-08-09 18:57:11,789 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 18:57:23,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=146580.0, ans=0.125 2024-08-09 18:57:32,545 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 25 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-09 18:57:56,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=146780.0, ans=0.125 2024-08-09 18:58:01,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146780.0, ans=0.1 2024-08-09 18:58:18,832 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 16 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 18:58:20,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 200, loss[loss=0.0747, beats_loss=0.0171, ecapa_loss=0.0003839, whisper_loss=0.05376, over 17593.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01299, ecapa_loss=0.0003634, whisper_loss=0.1017, over 2398083.31 frames. ], batch size: 78, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:58:23,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.970e+01 3.444e+01 4.293e+01 6.916e+01, threshold=6.888e+01, percent-clipped=0.0 2024-08-09 18:58:27,034 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-09 18:58:52,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=147180.0, ans=0.125 2024-08-09 18:59:08,274 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-09 18:59:08,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147280.0, ans=0.1 2024-08-09 18:59:22,034 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 18:59:28,304 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 18:59:31,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=147380.0, ans=0.125 2024-08-09 18:59:39,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 250, loss[loss=0.1168, beats_loss=0.01216, ecapa_loss=0.0003077, whisper_loss=0.1015, over 20265.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01288, ecapa_loss=0.0003576, whisper_loss=0.1014, over 2702033.04 frames. ], batch size: 77, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 18:59:40,586 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 18:59:51,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=147480.0, ans=0.1 2024-08-09 18:59:53,106 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 18:59:54,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=147580.0, ans=0.125 2024-08-09 19:00:13,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=12.0 2024-08-09 19:00:21,536 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.620e-01 2024-08-09 19:00:26,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=147780.0, ans=0.09899494936611666 2024-08-09 19:00:41,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=147880.0, ans=0.0 2024-08-09 19:00:43,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.36 vs. limit=15.0 2024-08-09 19:00:43,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.84 vs. limit=15.0 2024-08-09 19:00:54,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 300, loss[loss=0.1182, beats_loss=0.01212, ecapa_loss=0.0003345, whisper_loss=0.1028, over 20341.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01274, ecapa_loss=0.000355, whisper_loss=0.1021, over 2948761.77 frames. ], batch size: 77, lr: 2.97e-02, grad_scale: 4096.0 2024-08-09 19:00:57,404 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 3.134e+01 3.449e+01 4.098e+01 7.776e+01, threshold=6.897e+01, percent-clipped=1.0 2024-08-09 19:01:02,188 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 19:01:15,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=148080.0, ans=0.09899494936611666 2024-08-09 19:01:16,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=148080.0, ans=0.125 2024-08-09 19:01:50,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=148280.0, ans=0.125 2024-08-09 19:01:55,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148380.0, ans=0.1 2024-08-09 19:02:04,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148380.0, ans=0.1 2024-08-09 19:02:04,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=148380.0, ans=0.1 2024-08-09 19:02:08,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 350, loss[loss=0.1008, beats_loss=0.0153, ecapa_loss=0.0002576, whisper_loss=0.08287, over 17566.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.0128, ecapa_loss=0.0003514, whisper_loss=0.102, over 3132189.94 frames. ], batch size: 67, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:02:15,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=148480.0, ans=0.1 2024-08-09 19:02:23,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=148580.0, ans=0.125 2024-08-09 19:02:24,587 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 19:02:38,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-08-09 19:02:45,455 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 19:02:47,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=148680.0, ans=0.05 2024-08-09 19:02:48,989 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 19:03:11,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=148880.0, ans=0.125 2024-08-09 19:03:18,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148880.0, ans=0.125 2024-08-09 19:03:23,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 400, loss[loss=0.1325, beats_loss=0.0143, ecapa_loss=0.0003114, whisper_loss=0.115, over 18850.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01283, ecapa_loss=0.0003451, whisper_loss=0.1021, over 3300526.90 frames. ], batch size: 72, lr: 2.96e-02, grad_scale: 4096.0 2024-08-09 19:03:25,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.813e+01 3.235e+01 3.879e+01 6.977e+01, threshold=6.469e+01, percent-clipped=1.0 2024-08-09 19:03:27,244 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 19:03:35,902 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 19:03:48,531 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-09 19:03:53,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=149180.0, ans=0.125 2024-08-09 19:03:55,999 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 19:04:13,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=149280.0, ans=0.125 2024-08-09 19:04:15,670 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 19:04:19,283 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 19:04:27,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=149380.0, ans=0.07 2024-08-09 19:04:28,556 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 19:04:32,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=149380.0, ans=0.07 2024-08-09 19:04:38,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 450, loss[loss=0.1309, beats_loss=0.01144, ecapa_loss=0.0002998, whisper_loss=0.1164, over 22102.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01284, ecapa_loss=0.0003442, whisper_loss=0.1019, over 3419241.89 frames. ], batch size: 85, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:04:38,972 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 19:04:45,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=149480.0, ans=0.025 2024-08-09 19:05:08,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=149680.0, ans=0.0 2024-08-09 19:05:14,118 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 17 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 19:05:27,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149780.0, ans=0.0 2024-08-09 19:05:43,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149880.0, ans=0.0 2024-08-09 19:05:54,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 500, loss[loss=0.1013, beats_loss=0.01412, ecapa_loss=0.0003635, whisper_loss=0.08354, over 19980.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01283, ecapa_loss=0.0003436, whisper_loss=0.1014, over 3512902.16 frames. ], batch size: 87, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:05:57,090 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.962e+01 3.493e+01 4.226e+01 6.986e+01, threshold=6.987e+01, percent-clipped=1.0 2024-08-09 19:06:06,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.25 vs. limit=22.5 2024-08-09 19:06:19,705 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 19:06:25,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=150180.0, ans=0.0 2024-08-09 19:06:41,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150280.0, ans=0.1 2024-08-09 19:06:51,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=150280.0, ans=0.0 2024-08-09 19:06:57,271 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 19:07:10,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 550, loss[loss=0.1017, beats_loss=0.01686, ecapa_loss=0.0002959, whisper_loss=0.08193, over 20602.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01287, ecapa_loss=0.0003419, whisper_loss=0.1013, over 3598374.70 frames. ], batch size: 83, lr: 2.95e-02, grad_scale: 4096.0 2024-08-09 19:07:30,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2024-08-09 19:07:30,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2024-08-09 19:07:31,434 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.050e-01 2024-08-09 19:07:41,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150680.0, ans=0.1 2024-08-09 19:07:47,184 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 19:07:50,227 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 19:07:57,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=150780.0, ans=0.0 2024-08-09 19:08:03,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=150780.0, ans=0.125 2024-08-09 19:08:05,318 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-09 19:08:06,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=150780.0, ans=0.125 2024-08-09 19:08:11,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=150880.0, ans=0.125 2024-08-09 19:08:21,516 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-09 19:08:21,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=150880.0, ans=0.125 2024-08-09 19:08:26,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 600, loss[loss=0.126, beats_loss=0.01544, ecapa_loss=0.000287, whisper_loss=0.1077, over 21389.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01283, ecapa_loss=0.000341, whisper_loss=0.1019, over 3680790.30 frames. ], batch size: 84, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:08:26,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2024-08-09 19:08:28,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=22.5 2024-08-09 19:08:28,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.923e+01 3.308e+01 3.857e+01 5.897e+01, threshold=6.616e+01, percent-clipped=0.0 2024-08-09 19:08:33,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150980.0, ans=0.1 2024-08-09 19:08:43,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=151080.0, ans=0.0 2024-08-09 19:08:53,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=151080.0, ans=0.2 2024-08-09 19:08:55,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=151180.0, ans=0.125 2024-08-09 19:08:57,532 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 19:08:58,859 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-09 19:09:06,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=151180.0, ans=0.125 2024-08-09 19:09:07,660 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 19:09:17,405 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-09 19:09:20,228 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 19:09:26,338 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-09 19:09:34,386 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 19:09:40,409 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 650, loss[loss=0.09731, beats_loss=0.01649, ecapa_loss=0.0003325, whisper_loss=0.07749, over 18086.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.0128, ecapa_loss=0.0003423, whisper_loss=0.1014, over 3699739.70 frames. ], batch size: 77, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:10:02,282 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-09 19:10:03,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-08-09 19:10:32,359 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 19:10:43,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=151880.0, ans=0.2 2024-08-09 19:10:43,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151880.0, ans=0.1 2024-08-09 19:10:55,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 700, loss[loss=0.09377, beats_loss=0.01705, ecapa_loss=0.0002803, whisper_loss=0.07391, over 14404.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01279, ecapa_loss=0.0003439, whisper_loss=0.1014, over 3687022.64 frames. ], batch size: 56, lr: 2.94e-02, grad_scale: 4096.0 2024-08-09 19:10:55,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=151980.0, ans=0.125 2024-08-09 19:10:57,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=151980.0, ans=0.125 2024-08-09 19:10:57,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=15.0 2024-08-09 19:10:57,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.682e+01 3.217e+01 3.765e+01 7.105e+01, threshold=6.434e+01, percent-clipped=1.0 2024-08-09 19:11:13,267 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-09 19:11:22,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=152080.0, ans=0.0 2024-08-09 19:11:33,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2024-08-09 19:11:38,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=15.0 2024-08-09 19:11:52,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2024-08-09 19:11:56,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=152380.0, ans=0.0 2024-08-09 19:12:07,356 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-09 19:12:09,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=152480.0, ans=0.05 2024-08-09 19:12:10,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 750, loss[loss=0.1126, beats_loss=0.01355, ecapa_loss=0.0003228, whisper_loss=0.09583, over 22957.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01274, ecapa_loss=0.0003396, whisper_loss=0.102, over 3710230.49 frames. ], batch size: 91, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:12:42,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-09 19:12:43,638 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-09 19:12:47,225 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 19:12:50,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2024-08-09 19:13:04,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.28 vs. limit=15.0 2024-08-09 19:13:20,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=152880.0, ans=0.125 2024-08-09 19:13:24,535 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-09 19:13:26,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 800, loss[loss=0.15, beats_loss=0.01018, ecapa_loss=0.000362, whisper_loss=0.1362, over 23725.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01289, ecapa_loss=0.0003358, whisper_loss=0.1013, over 3762005.42 frames. ], batch size: 92, lr: 2.93e-02, grad_scale: 4096.0 2024-08-09 19:13:30,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 2.796e+01 3.224e+01 3.871e+01 5.736e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-09 19:13:50,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=153080.0, ans=0.0 2024-08-09 19:13:56,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=153180.0, ans=0.125 2024-08-09 19:13:58,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=153180.0, ans=0.125 2024-08-09 19:14:00,638 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 12 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 19:14:15,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=153280.0, ans=0.125 2024-08-09 19:14:40,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=153380.0, ans=0.125 2024-08-09 19:14:43,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 850, loss[loss=0.1199, beats_loss=0.01123, ecapa_loss=0.0003507, whisper_loss=0.1052, over 18415.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01286, ecapa_loss=0.0003359, whisper_loss=0.1017, over 3789823.06 frames. ], batch size: 74, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:14:50,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=153480.0, ans=0.125 2024-08-09 19:15:02,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=153580.0, ans=0.125 2024-08-09 19:15:05,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=153580.0, ans=0.125 2024-08-09 19:15:08,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=153580.0, ans=0.0 2024-08-09 19:15:13,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=153680.0, ans=0.2 2024-08-09 19:15:16,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153680.0, ans=0.1 2024-08-09 19:15:27,519 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-09 19:15:33,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=153780.0, ans=0.0 2024-08-09 19:15:40,810 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:15:53,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=153880.0, ans=0.05 2024-08-09 19:16:01,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=153980.0, ans=0.125 2024-08-09 19:16:02,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 900, loss[loss=0.1035, beats_loss=0.01254, ecapa_loss=0.00033, whisper_loss=0.08769, over 18844.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.0129, ecapa_loss=0.0003332, whisper_loss=0.1011, over 3775810.06 frames. ], batch size: 75, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:16:04,864 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-09 19:16:05,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.273e+01 2.893e+01 3.249e+01 3.934e+01 7.637e+01, threshold=6.497e+01, percent-clipped=1.0 2024-08-09 19:16:35,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.66 vs. limit=10.0 2024-08-09 19:16:36,670 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 19:16:40,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=15.0 2024-08-09 19:16:53,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=154280.0, ans=0.04949747468305833 2024-08-09 19:16:53,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=15.0 2024-08-09 19:17:00,083 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 19:17:00,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=154280.0, ans=0.04949747468305833 2024-08-09 19:17:09,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2024-08-09 19:17:16,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=154380.0, ans=0.2 2024-08-09 19:17:19,555 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 950, loss[loss=0.09502, beats_loss=0.0129, ecapa_loss=0.0003353, whisper_loss=0.07876, over 16453.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01283, ecapa_loss=0.0003329, whisper_loss=0.1005, over 3790111.73 frames. ], batch size: 64, lr: 2.92e-02, grad_scale: 4096.0 2024-08-09 19:17:19,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=154480.0, ans=0.0 2024-08-09 19:17:36,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=154580.0, ans=0.125 2024-08-09 19:17:37,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.32 vs. limit=22.5 2024-08-09 19:17:38,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.79 vs. limit=15.0 2024-08-09 19:17:44,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154580.0, ans=0.125 2024-08-09 19:17:45,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=154580.0, ans=0.2 2024-08-09 19:17:55,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=154680.0, ans=0.125 2024-08-09 19:18:09,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=154780.0, ans=0.125 2024-08-09 19:18:09,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=154780.0, ans=0.0 2024-08-09 19:18:19,528 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 19:18:37,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1000, loss[loss=0.1452, beats_loss=0.009663, ecapa_loss=0.0004521, whisper_loss=0.1311, over 22096.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01293, ecapa_loss=0.0003341, whisper_loss=0.1002, over 3809646.90 frames. ], batch size: 89, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:18:38,014 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 19:18:41,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.941e+01 3.307e+01 3.877e+01 7.420e+01, threshold=6.613e+01, percent-clipped=2.0 2024-08-09 19:18:47,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154980.0, ans=0.125 2024-08-09 19:19:00,777 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-09 19:19:02,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=155080.0, ans=0.125 2024-08-09 19:19:35,695 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 19:19:40,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=155280.0, ans=0.125 2024-08-09 19:19:56,251 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-09 19:19:59,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1050, loss[loss=0.1021, beats_loss=0.01581, ecapa_loss=0.0003123, whisper_loss=0.08317, over 18087.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01294, ecapa_loss=0.0003342, whisper_loss=0.1006, over 3815749.31 frames. ], batch size: 70, lr: 2.91e-02, grad_scale: 4096.0 2024-08-09 19:20:00,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=155480.0, ans=0.125 2024-08-09 19:20:03,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-09 19:20:07,511 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 19:20:19,319 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 11 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-09 19:20:19,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=155580.0, ans=0.125 2024-08-09 19:20:28,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2024-08-09 19:20:30,840 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 19:20:32,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=155680.0, ans=0.0 2024-08-09 19:20:58,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=155880.0, ans=0.125 2024-08-09 19:21:03,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=155880.0, ans=0.125 2024-08-09 19:21:13,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1100, loss[loss=0.08425, beats_loss=0.01485, ecapa_loss=0.0002543, whisper_loss=0.06686, over 15387.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01292, ecapa_loss=0.0003341, whisper_loss=0.1004, over 3796279.80 frames. ], batch size: 55, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:21:17,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.935e+01 3.266e+01 4.117e+01 7.646e+01, threshold=6.532e+01, percent-clipped=3.0 2024-08-09 19:21:21,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-08-09 19:21:41,881 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 19:22:03,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=156280.0, ans=0.0 2024-08-09 19:22:20,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=156380.0, ans=0.125 2024-08-09 19:22:24,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1150, loss[loss=0.1199, beats_loss=0.01081, ecapa_loss=0.0003263, whisper_loss=0.1058, over 21123.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01287, ecapa_loss=0.0003325, whisper_loss=0.1006, over 3792786.20 frames. ], batch size: 80, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:22:44,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-09 19:22:49,694 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 19:22:59,888 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 19:23:13,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-09 19:23:16,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156880.0, ans=0.1 2024-08-09 19:23:26,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=156880.0, ans=0.125 2024-08-09 19:23:30,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1200, loss[loss=0.1181, beats_loss=0.01217, ecapa_loss=0.0003753, whisper_loss=0.1022, over 17381.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01282, ecapa_loss=0.0003334, whisper_loss=0.1015, over 3788775.73 frames. ], batch size: 71, lr: 2.90e-02, grad_scale: 4096.0 2024-08-09 19:23:33,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+01 2.894e+01 3.270e+01 3.890e+01 7.018e+01, threshold=6.539e+01, percent-clipped=1.0 2024-08-09 19:23:51,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=157080.0, ans=0.125 2024-08-09 19:23:54,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-08-09 19:24:00,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=157180.0, ans=0.0 2024-08-09 19:24:00,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=157180.0, ans=0.125 2024-08-09 19:24:07,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=157180.0, ans=0.0 2024-08-09 19:24:12,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.31 vs. limit=12.0 2024-08-09 19:24:14,540 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-09 19:24:17,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-08-09 19:24:22,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2024-08-09 19:24:36,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1250, loss[loss=0.122, beats_loss=0.01159, ecapa_loss=0.000324, whisper_loss=0.1071, over 16797.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01282, ecapa_loss=0.0003323, whisper_loss=0.101, over 3797419.35 frames. ], batch size: 65, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:24:40,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=157480.0, ans=0.0 2024-08-09 19:24:41,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=157480.0, ans=10.0 2024-08-09 19:24:52,342 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 19:25:20,923 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-09 19:25:28,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157880.0, ans=0.1 2024-08-09 19:25:33,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=157880.0, ans=0.09899494936611666 2024-08-09 19:25:34,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=157880.0, ans=0.0 2024-08-09 19:25:39,592 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:25:40,594 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 19:25:40,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=157980.0, ans=0.125 2024-08-09 19:25:41,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1300, loss[loss=0.1003, beats_loss=0.01413, ecapa_loss=0.0002648, whisper_loss=0.0835, over 16871.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01286, ecapa_loss=0.0003318, whisper_loss=0.1005, over 3816609.05 frames. ], batch size: 63, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:25:43,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=157980.0, ans=0.0 2024-08-09 19:25:44,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.862e+01 3.141e+01 3.804e+01 7.057e+01, threshold=6.283e+01, percent-clipped=1.0 2024-08-09 19:25:50,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2024-08-09 19:26:01,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.37 vs. limit=10.0 2024-08-09 19:26:03,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=158080.0, ans=0.0 2024-08-09 19:26:08,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-09 19:26:11,987 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-09 19:26:15,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=158180.0, ans=0.0 2024-08-09 19:26:27,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=158280.0, ans=0.125 2024-08-09 19:26:29,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=158280.0, ans=0.125 2024-08-09 19:26:47,424 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1350, loss[loss=0.1165, beats_loss=0.01269, ecapa_loss=0.0003498, whisper_loss=0.1003, over 21893.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01287, ecapa_loss=0.0003301, whisper_loss=0.1004, over 3814269.93 frames. ], batch size: 89, lr: 2.89e-02, grad_scale: 4096.0 2024-08-09 19:26:57,420 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-09 19:26:59,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=158480.0, ans=0.0 2024-08-09 19:27:06,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=158580.0, ans=0.95 2024-08-09 19:27:06,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158580.0, ans=0.125 2024-08-09 19:27:06,661 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=9.595e-02 2024-08-09 19:27:10,056 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-09 19:27:20,996 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 19:27:27,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=158780.0, ans=0.125 2024-08-09 19:27:46,034 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 19:27:47,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158880.0, ans=0.1 2024-08-09 19:27:53,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1400, loss[loss=0.1226, beats_loss=0.01069, ecapa_loss=0.000361, whisper_loss=0.1083, over 18289.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01281, ecapa_loss=0.0003306, whisper_loss=0.1001, over 3783613.01 frames. ], batch size: 73, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:27:54,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=158980.0, ans=0.125 2024-08-09 19:27:56,782 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.826e+01 3.197e+01 3.856e+01 5.556e+01, threshold=6.395e+01, percent-clipped=0.0 2024-08-09 19:27:57,027 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 19:28:07,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=159080.0, ans=0.025 2024-08-09 19:28:19,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2024-08-09 19:28:20,267 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-09 19:28:21,477 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-09 19:28:34,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=159280.0, ans=0.125 2024-08-09 19:28:41,421 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-09 19:28:54,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=159380.0, ans=0.0 2024-08-09 19:29:00,234 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1450, loss[loss=0.09982, beats_loss=0.01539, ecapa_loss=0.0003413, whisper_loss=0.08102, over 18804.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01288, ecapa_loss=0.0003284, whisper_loss=0.09998, over 3824139.92 frames. ], batch size: 79, lr: 2.88e-02, grad_scale: 4096.0 2024-08-09 19:29:27,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=159480.0, ans=0.125 2024-08-09 19:29:48,336 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 27 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-09 19:29:49,581 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 19:29:51,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=159680.0, ans=0.125 2024-08-09 19:29:55,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=159680.0, ans=0.0 2024-08-09 19:30:01,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=159680.0, ans=0.125 2024-08-09 19:30:18,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=159880.0, ans=0.125 2024-08-09 19:30:34,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1500, loss[loss=0.1303, beats_loss=0.01053, ecapa_loss=0.0003216, whisper_loss=0.1166, over 20278.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01295, ecapa_loss=0.0003312, whisper_loss=0.09949, over 3821501.52 frames. ], batch size: 77, lr: 2.87e-02, grad_scale: 4096.0 2024-08-09 19:30:36,131 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-16000.pt 2024-08-09 19:30:39,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.965e+01 3.414e+01 4.022e+01 6.981e+01, threshold=6.828e+01, percent-clipped=1.0 2024-08-09 19:30:43,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.29 vs. limit=15.0 2024-08-09 19:30:48,762 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-09 19:30:50,343 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-09 19:30:57,790 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.274e+00 2024-08-09 19:31:00,532 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-09 19:31:02,012 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 19:31:22,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160280.0, ans=0.125 2024-08-09 19:31:38,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160380.0, ans=0.125 2024-08-09 19:31:39,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=160380.0, ans=0.2 2024-08-09 19:31:42,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=160380.0, ans=0.125 2024-08-09 19:31:43,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-09 19:31:44,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=160380.0, ans=0.125 2024-08-09 19:31:44,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-08-09 19:31:46,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2024-08-09 19:31:53,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=160480.0, ans=0.0 2024-08-09 19:31:54,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1550, loss[loss=0.1219, beats_loss=0.01233, ecapa_loss=0.0003214, whisper_loss=0.1063, over 15938.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01292, ecapa_loss=0.0003302, whisper_loss=0.09994, over 3836433.98 frames. ], batch size: 63, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:31:55,586 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 19:32:02,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=160480.0, ans=0.0 2024-08-09 19:32:03,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=160480.0, ans=0.125 2024-08-09 19:32:05,077 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-09 19:32:18,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=160580.0, ans=0.125 2024-08-09 19:32:25,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=160680.0, ans=0.025 2024-08-09 19:32:31,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=160680.0, ans=0.0 2024-08-09 19:32:36,414 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 19:32:56,803 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 19:33:03,530 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-09 19:33:12,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1600, loss[loss=0.1171, beats_loss=0.01066, ecapa_loss=0.0002976, whisper_loss=0.1035, over 17863.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01282, ecapa_loss=0.0003293, whisper_loss=0.1007, over 3847932.27 frames. ], batch size: 68, lr: 2.87e-02, grad_scale: 8192.0 2024-08-09 19:33:16,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.968e+01 3.450e+01 4.320e+01 7.036e+01, threshold=6.900e+01, percent-clipped=1.0 2024-08-09 19:33:28,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=161080.0, ans=0.125 2024-08-09 19:33:47,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=161180.0, ans=0.125 2024-08-09 19:34:07,524 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 19:34:12,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=161280.0, ans=0.5 2024-08-09 19:34:18,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=161380.0, ans=0.0 2024-08-09 19:34:30,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1650, loss[loss=0.1023, beats_loss=0.01629, ecapa_loss=0.0002985, whisper_loss=0.08299, over 21410.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01292, ecapa_loss=0.000326, whisper_loss=0.1009, over 3841177.01 frames. ], batch size: 88, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:34:39,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=33.72 vs. limit=22.5 2024-08-09 19:34:44,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161580.0, ans=0.1 2024-08-09 19:34:46,622 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-09 19:34:56,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=161580.0, ans=0.0 2024-08-09 19:35:06,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=161680.0, ans=0.0 2024-08-09 19:35:15,368 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-09 19:35:17,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=161780.0, ans=0.125 2024-08-09 19:35:18,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.01 vs. limit=15.0 2024-08-09 19:35:19,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=161780.0, ans=0.2 2024-08-09 19:35:40,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=161880.0, ans=0.125 2024-08-09 19:35:42,503 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-09 19:35:45,571 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1700, loss[loss=0.08745, beats_loss=0.01536, ecapa_loss=0.0003463, whisper_loss=0.06863, over 17445.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01285, ecapa_loss=0.000325, whisper_loss=0.1013, over 3844837.14 frames. ], batch size: 75, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:35:48,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.753e+01 3.153e+01 3.657e+01 6.641e+01, threshold=6.306e+01, percent-clipped=0.0 2024-08-09 19:35:52,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=161980.0, ans=0.1 2024-08-09 19:35:52,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2024-08-09 19:36:11,052 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 12 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 19:36:14,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=162180.0, ans=0.0 2024-08-09 19:36:33,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-09 19:36:51,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=162380.0, ans=0.125 2024-08-09 19:36:52,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.39 vs. limit=8.0 2024-08-09 19:36:57,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162380.0, ans=0.1 2024-08-09 19:36:59,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1750, loss[loss=0.1093, beats_loss=0.01207, ecapa_loss=0.0002723, whisper_loss=0.09455, over 15909.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.0128, ecapa_loss=0.0003247, whisper_loss=0.1012, over 3855765.33 frames. ], batch size: 58, lr: 2.86e-02, grad_scale: 8192.0 2024-08-09 19:37:00,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=162480.0, ans=0.125 2024-08-09 19:37:07,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162480.0, ans=0.1 2024-08-09 19:37:08,954 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 19:37:20,344 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 19:37:34,580 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 19:37:40,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=162680.0, ans=0.125 2024-08-09 19:37:46,688 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 19:38:03,732 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 19:38:16,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1800, loss[loss=0.1213, beats_loss=0.01282, ecapa_loss=0.0003038, whisper_loss=0.1055, over 22639.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01278, ecapa_loss=0.0003284, whisper_loss=0.1012, over 3851786.91 frames. ], batch size: 91, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:38:18,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.809e+01 3.330e+01 3.752e+01 6.796e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-09 19:38:39,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=163080.0, ans=0.125 2024-08-09 19:38:48,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=163180.0, ans=0.125 2024-08-09 19:38:52,612 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 19:39:05,565 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-09 19:39:10,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=163280.0, ans=0.0 2024-08-09 19:39:31,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1850, loss[loss=0.07899, beats_loss=0.01329, ecapa_loss=0.0003544, whisper_loss=0.06216, over 13558.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01279, ecapa_loss=0.0003315, whisper_loss=0.1007, over 3823351.46 frames. ], batch size: 54, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:39:36,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=163480.0, ans=0.0 2024-08-09 19:39:51,687 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-09 19:40:01,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=163680.0, ans=0.125 2024-08-09 19:40:01,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2024-08-09 19:40:11,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=163680.0, ans=0.125 2024-08-09 19:40:12,627 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-09 19:40:24,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2024-08-09 19:40:34,236 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.476e+02 2024-08-09 19:40:42,907 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1900, loss[loss=0.1179, beats_loss=0.01337, ecapa_loss=0.0003154, whisper_loss=0.1013, over 20079.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01285, ecapa_loss=0.0003383, whisper_loss=0.1005, over 3843283.39 frames. ], batch size: 79, lr: 2.85e-02, grad_scale: 8192.0 2024-08-09 19:40:45,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.888e+01 3.200e+01 3.675e+01 7.363e+01, threshold=6.401e+01, percent-clipped=1.0 2024-08-09 19:40:49,915 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-09 19:40:50,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.82 vs. limit=10.0 2024-08-09 19:40:51,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=163980.0, ans=0.125 2024-08-09 19:41:04,751 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-09 19:41:16,359 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 19:41:24,319 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 19:41:30,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-08-09 19:41:32,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164280.0, ans=0.1 2024-08-09 19:41:35,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=164380.0, ans=0.0 2024-08-09 19:41:45,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=164380.0, ans=0.125 2024-08-09 19:41:49,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 1950, loss[loss=0.1061, beats_loss=0.009464, ecapa_loss=0.000376, whisper_loss=0.09291, over 15081.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01281, ecapa_loss=0.0003443, whisper_loss=0.1006, over 3855998.03 frames. ], batch size: 54, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:41:49,637 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 13 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-09 19:41:49,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=164480.0, ans=0.05 2024-08-09 19:41:52,448 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-09 19:42:06,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=164580.0, ans=0.09899494936611666 2024-08-09 19:42:09,635 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-09 19:42:13,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=164580.0, ans=0.125 2024-08-09 19:42:19,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=164680.0, ans=0.02 2024-08-09 19:42:31,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=164780.0, ans=0.05 2024-08-09 19:42:45,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=164880.0, ans=0.0 2024-08-09 19:42:49,247 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 19:42:55,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2000, loss[loss=0.1058, beats_loss=0.01217, ecapa_loss=0.0003821, whisper_loss=0.08977, over 17600.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01277, ecapa_loss=0.0003488, whisper_loss=0.101, over 3835662.16 frames. ], batch size: 72, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:42:58,186 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.959e+01 3.174e+01 3.680e+01 5.777e+01, threshold=6.348e+01, percent-clipped=0.0 2024-08-09 19:43:02,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=164980.0, ans=0.0 2024-08-09 19:43:09,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=165080.0, ans=0.125 2024-08-09 19:43:20,404 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 11 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 19:43:24,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=165180.0, ans=0.125 2024-08-09 19:43:31,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=165180.0, ans=0.05 2024-08-09 19:43:33,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=165280.0, ans=0.125 2024-08-09 19:43:48,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=165380.0, ans=0.125 2024-08-09 19:43:55,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=165380.0, ans=0.125 2024-08-09 19:44:01,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2050, loss[loss=0.1401, beats_loss=0.012, ecapa_loss=0.0003438, whisper_loss=0.1247, over 22384.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01287, ecapa_loss=0.0003534, whisper_loss=0.101, over 3866831.66 frames. ], batch size: 88, lr: 2.84e-02, grad_scale: 8192.0 2024-08-09 19:44:01,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=165480.0, ans=0.125 2024-08-09 19:44:07,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=165480.0, ans=0.125 2024-08-09 19:44:18,957 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 19:44:24,021 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 19:44:32,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2024-08-09 19:44:41,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=165780.0, ans=0.05 2024-08-09 19:45:00,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=165880.0, ans=0.2 2024-08-09 19:45:06,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2100, loss[loss=0.1114, beats_loss=0.01218, ecapa_loss=0.0003835, whisper_loss=0.09535, over 16755.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01294, ecapa_loss=0.0003544, whisper_loss=0.1008, over 3855887.10 frames. ], batch size: 71, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:45:07,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165980.0, ans=0.1 2024-08-09 19:45:08,404 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 19:45:09,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.923e+01 3.262e+01 4.036e+01 6.421e+01, threshold=6.525e+01, percent-clipped=1.0 2024-08-09 19:45:23,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2024-08-09 19:45:36,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.56 vs. limit=22.5 2024-08-09 19:45:41,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-09 19:45:43,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=166180.0, ans=0.125 2024-08-09 19:45:56,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=166280.0, ans=0.125 2024-08-09 19:45:58,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=166380.0, ans=0.125 2024-08-09 19:46:08,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2024-08-09 19:46:12,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2150, loss[loss=0.1292, beats_loss=0.01209, ecapa_loss=0.0003913, whisper_loss=0.1132, over 23052.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.013, ecapa_loss=0.0003515, whisper_loss=0.101, over 3879983.99 frames. ], batch size: 94, lr: 2.83e-02, grad_scale: 8192.0 2024-08-09 19:46:18,005 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 19:46:19,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=166480.0, ans=0.0 2024-08-09 19:46:22,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=166480.0, ans=0.0 2024-08-09 19:46:39,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=12.0 2024-08-09 19:46:40,306 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-09 19:46:56,992 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-09 19:46:57,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=166780.0, ans=0.0 2024-08-09 19:46:59,938 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-09 19:47:18,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2200, loss[loss=0.1215, beats_loss=0.01401, ecapa_loss=0.0002715, whisper_loss=0.1048, over 22943.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01291, ecapa_loss=0.0003524, whisper_loss=0.1016, over 3848364.59 frames. ], batch size: 91, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:47:21,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.890e+01 3.143e+01 3.810e+01 5.998e+01, threshold=6.286e+01, percent-clipped=0.0 2024-08-09 19:47:31,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=167080.0, ans=0.125 2024-08-09 19:47:33,938 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 19:47:36,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=167080.0, ans=0.0 2024-08-09 19:47:42,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=167080.0, ans=0.5 2024-08-09 19:47:45,944 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 19:47:46,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167180.0, ans=0.1 2024-08-09 19:48:11,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=167380.0, ans=0.0 2024-08-09 19:48:13,624 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 19:48:16,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=167380.0, ans=0.0 2024-08-09 19:48:23,887 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2250, loss[loss=0.1282, beats_loss=0.0116, ecapa_loss=0.0003691, whisper_loss=0.1129, over 22189.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01287, ecapa_loss=0.0003548, whisper_loss=0.102, over 3844878.90 frames. ], batch size: 87, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:48:27,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=167480.0, ans=0.2 2024-08-09 19:48:29,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167480.0, ans=0.1 2024-08-09 19:48:35,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=167580.0, ans=0.125 2024-08-09 19:48:43,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167580.0, ans=0.1 2024-08-09 19:48:47,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167580.0, ans=0.1 2024-08-09 19:48:53,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=167680.0, ans=0.125 2024-08-09 19:49:15,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=167880.0, ans=0.09899494936611666 2024-08-09 19:49:19,695 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-09 19:49:23,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=167880.0, ans=0.125 2024-08-09 19:49:28,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2300, loss[loss=0.1483, beats_loss=0.007581, ecapa_loss=0.0004064, whisper_loss=0.1367, over 17539.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01287, ecapa_loss=0.0003551, whisper_loss=0.1022, over 3860185.73 frames. ], batch size: 67, lr: 2.82e-02, grad_scale: 8192.0 2024-08-09 19:49:31,219 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 3.098e+01 3.355e+01 3.897e+01 6.798e+01, threshold=6.710e+01, percent-clipped=2.0 2024-08-09 19:49:43,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=168080.0, ans=0.125 2024-08-09 19:49:45,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-08-09 19:49:45,818 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-09 19:49:48,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=168080.0, ans=0.125 2024-08-09 19:49:49,525 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 19:49:53,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=168180.0, ans=0.125 2024-08-09 19:49:57,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=168180.0, ans=0.125 2024-08-09 19:50:01,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=168180.0, ans=0.125 2024-08-09 19:50:07,998 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-09 19:50:08,279 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 19:50:16,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-08-09 19:50:19,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=168280.0, ans=0.04949747468305833 2024-08-09 19:50:22,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=168380.0, ans=0.125 2024-08-09 19:50:27,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168380.0, ans=0.125 2024-08-09 19:50:34,836 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2350, loss[loss=0.1121, beats_loss=0.01242, ecapa_loss=0.0004672, whisper_loss=0.09502, over 21815.00 frames. ], tot_loss[loss=0.1196, beats_loss=0.01267, ecapa_loss=0.0003586, whisper_loss=0.1033, over 3860850.60 frames. ], batch size: 92, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:51:00,127 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-09 19:51:32,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=168880.0, ans=0.125 2024-08-09 19:51:33,232 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 19:51:36,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=168880.0, ans=0.125 2024-08-09 19:51:37,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=168880.0, ans=0.125 2024-08-09 19:51:41,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=168880.0, ans=0.0 2024-08-09 19:51:43,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2400, loss[loss=0.08178, beats_loss=0.01865, ecapa_loss=0.000294, whisper_loss=0.06018, over 23168.00 frames. ], tot_loss[loss=0.1193, beats_loss=0.01268, ecapa_loss=0.0003564, whisper_loss=0.103, over 3855062.59 frames. ], batch size: 98, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:51:46,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 2.941e+01 3.344e+01 3.819e+01 6.517e+01, threshold=6.689e+01, percent-clipped=0.0 2024-08-09 19:51:46,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=168980.0, ans=0.125 2024-08-09 19:51:50,005 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-09 19:51:55,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=169080.0, ans=0.0 2024-08-09 19:52:09,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=169180.0, ans=0.125 2024-08-09 19:52:21,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2024-08-09 19:52:41,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=169380.0, ans=0.0 2024-08-09 19:52:42,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=169380.0, ans=0.125 2024-08-09 19:52:44,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=22.5 2024-08-09 19:52:50,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2450, loss[loss=0.102, beats_loss=0.01273, ecapa_loss=0.0002451, whisper_loss=0.08684, over 15427.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01271, ecapa_loss=0.0003539, whisper_loss=0.1025, over 3869424.96 frames. ], batch size: 59, lr: 2.81e-02, grad_scale: 8192.0 2024-08-09 19:52:57,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-09 19:53:25,675 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 19:53:29,922 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=6.592e-02 2024-08-09 19:53:48,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=169880.0, ans=0.0 2024-08-09 19:54:00,408 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2500, loss[loss=0.1299, beats_loss=0.01212, ecapa_loss=0.0003473, whisper_loss=0.1143, over 20645.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01281, ecapa_loss=0.0003543, whisper_loss=0.102, over 3891722.63 frames. ], batch size: 78, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:54:03,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.848e+01 3.405e+01 3.928e+01 5.880e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-09 19:54:03,276 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 11 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 19:54:09,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=169980.0, ans=0.07 2024-08-09 19:54:10,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-08-09 19:54:34,377 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.329e-01 2024-08-09 19:54:39,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=170180.0, ans=0.125 2024-08-09 19:54:42,646 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 19:54:48,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=12.0 2024-08-09 19:54:55,270 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-09 19:55:04,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=170380.0, ans=0.125 2024-08-09 19:55:06,894 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 19:55:10,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=170380.0, ans=0.125 2024-08-09 19:55:12,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2550, loss[loss=0.1057, beats_loss=0.01478, ecapa_loss=0.0002944, whisper_loss=0.08794, over 23973.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01292, ecapa_loss=0.0003508, whisper_loss=0.1016, over 3918864.28 frames. ], batch size: 94, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:55:25,272 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 19:55:35,629 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 19:55:44,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=170680.0, ans=0.0 2024-08-09 19:55:50,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=170680.0, ans=0.0 2024-08-09 19:56:02,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-09 19:56:18,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=170880.0, ans=0.2 2024-08-09 19:56:20,902 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-09 19:56:26,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2600, loss[loss=0.1085, beats_loss=0.01398, ecapa_loss=0.0003499, whisper_loss=0.09104, over 21040.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01285, ecapa_loss=0.0003503, whisper_loss=0.1018, over 3915610.11 frames. ], batch size: 85, lr: 2.80e-02, grad_scale: 8192.0 2024-08-09 19:56:29,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.011e+01 3.512e+01 4.102e+01 7.361e+01, threshold=7.024e+01, percent-clipped=2.0 2024-08-09 19:56:51,409 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 19:56:53,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=171180.0, ans=0.125 2024-08-09 19:56:57,157 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 19:56:57,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=171180.0, ans=0.125 2024-08-09 19:56:58,508 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-09 19:57:04,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=171180.0, ans=0.0 2024-08-09 19:57:11,705 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 19:57:18,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=171280.0, ans=0.0 2024-08-09 19:57:20,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=171280.0, ans=0.125 2024-08-09 19:57:36,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2650, loss[loss=0.1222, beats_loss=0.01191, ecapa_loss=0.0003674, whisper_loss=0.1066, over 19885.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01283, ecapa_loss=0.0003509, whisper_loss=0.1015, over 3901620.77 frames. ], batch size: 78, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:57:39,788 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 19:57:52,834 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 19:58:14,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=171680.0, ans=0.2 2024-08-09 19:58:28,583 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 19:58:48,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2700, loss[loss=0.1212, beats_loss=0.0105, ecapa_loss=0.0003584, whisper_loss=0.1071, over 19442.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01285, ecapa_loss=0.0003507, whisper_loss=0.1011, over 3896604.57 frames. ], batch size: 78, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 19:58:51,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.909e+01 3.335e+01 3.725e+01 7.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-09 19:59:07,536 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-09 19:59:14,437 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 19:59:20,558 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 19:59:35,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-08-09 19:59:56,346 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 19:59:59,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2750, loss[loss=0.1236, beats_loss=0.01205, ecapa_loss=0.0002926, whisper_loss=0.1086, over 13868.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01284, ecapa_loss=0.0003485, whisper_loss=0.1015, over 3908566.44 frames. ], batch size: 53, lr: 2.79e-02, grad_scale: 8192.0 2024-08-09 20:00:00,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=172480.0, ans=0.07 2024-08-09 20:00:34,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2024-08-09 20:00:39,318 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 20:00:51,609 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 20:01:12,569 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2800, loss[loss=0.1375, beats_loss=0.01021, ecapa_loss=0.0003895, whisper_loss=0.1234, over 23628.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01284, ecapa_loss=0.0003469, whisper_loss=0.1015, over 3895283.23 frames. ], batch size: 93, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:01:15,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 3.001e+01 3.485e+01 3.958e+01 7.033e+01, threshold=6.969e+01, percent-clipped=2.0 2024-08-09 20:01:30,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=173080.0, ans=0.2 2024-08-09 20:01:39,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173180.0, ans=0.1 2024-08-09 20:01:55,328 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-09 20:02:00,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2024-08-09 20:02:04,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=173280.0, ans=0.125 2024-08-09 20:02:06,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.69 vs. limit=22.5 2024-08-09 20:02:20,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=173380.0, ans=0.0 2024-08-09 20:02:24,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2850, loss[loss=0.1094, beats_loss=0.01214, ecapa_loss=0.0004126, whisper_loss=0.09311, over 17830.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01287, ecapa_loss=0.0003452, whisper_loss=0.1017, over 3867632.65 frames. ], batch size: 76, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:02:39,195 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 20:02:43,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-08-09 20:02:59,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-08-09 20:03:03,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=173680.0, ans=0.125 2024-08-09 20:03:07,148 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-09 20:03:09,845 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 20:03:36,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173980.0, ans=0.1 2024-08-09 20:03:36,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2900, loss[loss=0.1069, beats_loss=0.01122, ecapa_loss=0.000321, whisper_loss=0.09249, over 19113.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01288, ecapa_loss=0.0003476, whisper_loss=0.1014, over 3826321.99 frames. ], batch size: 74, lr: 2.78e-02, grad_scale: 8192.0 2024-08-09 20:03:37,111 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 20:03:40,012 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.065e+01 3.431e+01 3.879e+01 6.098e+01, threshold=6.862e+01, percent-clipped=0.0 2024-08-09 20:04:00,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2024-08-09 20:04:09,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-08-09 20:04:23,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=174280.0, ans=0.1 2024-08-09 20:04:45,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=174380.0, ans=0.0 2024-08-09 20:04:46,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174480.0, ans=0.1 2024-08-09 20:04:47,887 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 2950, loss[loss=0.1073, beats_loss=0.01158, ecapa_loss=0.00037, whisper_loss=0.09201, over 19334.00 frames. ], tot_loss[loss=0.118, beats_loss=0.0128, ecapa_loss=0.0003501, whisper_loss=0.1017, over 3853518.61 frames. ], batch size: 75, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:04:57,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.60 vs. limit=22.5 2024-08-09 20:05:22,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=12.0 2024-08-09 20:05:45,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=174780.0, ans=10.0 2024-08-09 20:05:46,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=174780.0, ans=0.09899494936611666 2024-08-09 20:05:58,302 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-09 20:06:04,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=174880.0, ans=0.0 2024-08-09 20:06:07,620 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 20:06:14,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3000, loss[loss=0.1401, beats_loss=0.01272, ecapa_loss=0.0003248, whisper_loss=0.1241, over 23368.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01286, ecapa_loss=0.0003478, whisper_loss=0.102, over 3862836.92 frames. ], batch size: 90, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:06:14,465 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 20:06:58,504 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on ASR_libri: loss=0.2837, beats_loss=0, ecapa_loss=0.001014, whisper_loss=0.2736, over 922467.00 frames. 2024-08-09 20:07:17,300 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on SV_voxceleb1: loss=0.009278, beats_loss=0, ecapa_loss=0.0009278, whisper_loss=0, over 939242.00 frames. 2024-08-09 20:08:50,855 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on AT_audioset: loss=0.03024, beats_loss=0.03024, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 20:08:50,863 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 20:08:53,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.977e+01 3.430e+01 4.027e+01 7.550e+01, threshold=6.860e+01, percent-clipped=3.0 2024-08-09 20:09:24,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175180.0, ans=0.1 2024-08-09 20:09:34,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=175180.0, ans=0.0 2024-08-09 20:09:43,709 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 42 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-09 20:09:50,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=175280.0, ans=0.0 2024-08-09 20:10:28,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3050, loss[loss=0.1113, beats_loss=0.01509, ecapa_loss=0.0002834, whisper_loss=0.09334, over 20056.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.01285, ecapa_loss=0.0003486, whisper_loss=0.1021, over 3895403.93 frames. ], batch size: 77, lr: 2.77e-02, grad_scale: 8192.0 2024-08-09 20:10:33,490 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-09 20:10:59,664 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 20:11:20,031 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 20:11:58,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=175880.0, ans=0.125 2024-08-09 20:12:00,050 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 20:12:21,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3100, loss[loss=0.1311, beats_loss=0.01298, ecapa_loss=0.0003833, whisper_loss=0.1143, over 22270.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01286, ecapa_loss=0.00035, whisper_loss=0.1028, over 3912496.09 frames. ], batch size: 88, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:12:25,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.112e+01 3.600e+01 4.119e+01 8.540e+01, threshold=7.200e+01, percent-clipped=4.0 2024-08-09 20:12:35,795 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.341e+03 2024-08-09 20:12:48,940 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-09 20:13:12,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=176180.0, ans=0.125 2024-08-09 20:13:21,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=176180.0, ans=0.125 2024-08-09 20:13:25,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=176280.0, ans=0.125 2024-08-09 20:13:59,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=176380.0, ans=0.0 2024-08-09 20:14:08,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3150, loss[loss=0.1212, beats_loss=0.01253, ecapa_loss=0.0003187, whisper_loss=0.1055, over 23007.00 frames. ], tot_loss[loss=0.119, beats_loss=0.0129, ecapa_loss=0.0003486, whisper_loss=0.1026, over 3906480.87 frames. ], batch size: 89, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:14:15,359 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 20:14:50,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.44 vs. limit=22.5 2024-08-09 20:14:52,359 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 20:15:01,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=176680.0, ans=0.2 2024-08-09 20:15:06,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2024-08-09 20:15:12,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=176680.0, ans=0.95 2024-08-09 20:15:16,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=176780.0, ans=0.125 2024-08-09 20:15:22,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.93 vs. limit=15.0 2024-08-09 20:15:29,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-08-09 20:15:33,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=176880.0, ans=0.125 2024-08-09 20:15:42,180 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 20:15:46,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3200, loss[loss=0.1055, beats_loss=0.01327, ecapa_loss=0.000385, whisper_loss=0.08839, over 16113.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01279, ecapa_loss=0.0003496, whisper_loss=0.1025, over 3873258.20 frames. ], batch size: 65, lr: 2.76e-02, grad_scale: 8192.0 2024-08-09 20:15:46,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=176980.0, ans=10.0 2024-08-09 20:15:49,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.848e+01 3.292e+01 3.822e+01 6.429e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-09 20:16:08,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=177080.0, ans=0.0 2024-08-09 20:16:23,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177180.0, ans=0.0 2024-08-09 20:16:23,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2024-08-09 20:16:24,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=177180.0, ans=0.125 2024-08-09 20:16:33,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-09 20:16:48,069 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 25 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-09 20:16:57,761 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 20:17:00,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3250, loss[loss=0.1146, beats_loss=0.01509, ecapa_loss=0.0002997, whisper_loss=0.09647, over 22513.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01287, ecapa_loss=0.0003477, whisper_loss=0.1022, over 3884510.94 frames. ], batch size: 87, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:17:41,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=177680.0, ans=0.0 2024-08-09 20:17:50,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177780.0, ans=0.1 2024-08-09 20:18:09,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=177880.0, ans=0.125 2024-08-09 20:18:14,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3300, loss[loss=0.1274, beats_loss=0.01501, ecapa_loss=0.0003162, whisper_loss=0.1092, over 21452.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01284, ecapa_loss=0.0003481, whisper_loss=0.1026, over 3878607.16 frames. ], batch size: 85, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:18:18,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.080e+01 3.504e+01 4.263e+01 7.840e+01, threshold=7.009e+01, percent-clipped=4.0 2024-08-09 20:18:21,297 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 20:18:22,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=177980.0, ans=0.07 2024-08-09 20:18:28,856 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 20:18:33,405 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-09 20:18:39,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=15.0 2024-08-09 20:18:44,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=178180.0, ans=0.125 2024-08-09 20:18:49,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=178180.0, ans=0.0 2024-08-09 20:19:02,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=178280.0, ans=0.2 2024-08-09 20:19:25,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=178380.0, ans=0.125 2024-08-09 20:19:36,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3350, loss[loss=0.121, beats_loss=0.008539, ecapa_loss=0.0004626, whisper_loss=0.1078, over 17063.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01277, ecapa_loss=0.0003483, whisper_loss=0.1029, over 3888612.52 frames. ], batch size: 71, lr: 2.75e-02, grad_scale: 8192.0 2024-08-09 20:19:36,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=178480.0, ans=0.125 2024-08-09 20:19:54,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=178580.0, ans=0.125 2024-08-09 20:20:02,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=178580.0, ans=0.125 2024-08-09 20:20:09,111 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 18 from LS+wenet, 30 from Vox, 49 fro AS 2024-08-09 20:20:12,655 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 20:20:17,725 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 20:20:28,744 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-09 20:20:30,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=178780.0, ans=0.125 2024-08-09 20:20:33,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=178780.0, ans=0.125 2024-08-09 20:20:35,256 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.719e+00 2024-08-09 20:20:35,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=178780.0, ans=0.0 2024-08-09 20:20:42,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=15.0 2024-08-09 20:20:47,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-08-09 20:20:58,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3400, loss[loss=0.1327, beats_loss=0.0117, ecapa_loss=0.0002771, whisper_loss=0.1182, over 23746.00 frames. ], tot_loss[loss=0.119, beats_loss=0.01276, ecapa_loss=0.0003491, whisper_loss=0.1028, over 3877494.51 frames. ], batch size: 89, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:21:00,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 2.994e+01 3.327e+01 4.294e+01 6.950e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 20:21:18,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=179080.0, ans=0.2 2024-08-09 20:21:20,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179080.0, ans=0.1 2024-08-09 20:21:20,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=179080.0, ans=0.125 2024-08-09 20:21:31,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179180.0, ans=0.1 2024-08-09 20:21:40,673 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 20:21:57,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=179280.0, ans=0.125 2024-08-09 20:21:59,868 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-09 20:22:16,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2024-08-09 20:22:20,590 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 20:22:21,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3450, loss[loss=0.1131, beats_loss=0.01168, ecapa_loss=0.0004217, whisper_loss=0.09725, over 17400.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01278, ecapa_loss=0.0003489, whisper_loss=0.1023, over 3890751.55 frames. ], batch size: 70, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:22:28,302 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:22:33,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179480.0, ans=0.1 2024-08-09 20:22:38,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-09 20:22:58,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=179680.0, ans=0.125 2024-08-09 20:23:11,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-08-09 20:23:22,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=179780.0, ans=0.2 2024-08-09 20:23:23,756 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 20:23:32,414 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 20:23:43,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3500, loss[loss=0.1183, beats_loss=0.01296, ecapa_loss=0.0002816, whisper_loss=0.1025, over 18351.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01275, ecapa_loss=0.0003468, whisper_loss=0.1026, over 3902823.43 frames. ], batch size: 74, lr: 2.74e-02, grad_scale: 8192.0 2024-08-09 20:23:47,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.966e+01 3.324e+01 3.987e+01 6.193e+01, threshold=6.648e+01, percent-clipped=0.0 2024-08-09 20:23:57,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=179980.0, ans=0.07 2024-08-09 20:24:14,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=180080.0, ans=0.0 2024-08-09 20:24:17,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=180180.0, ans=0.125 2024-08-09 20:24:27,346 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-09 20:24:37,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=180280.0, ans=0.125 2024-08-09 20:24:49,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=180280.0, ans=0.125 2024-08-09 20:24:56,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-08-09 20:25:08,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3550, loss[loss=0.1031, beats_loss=0.01443, ecapa_loss=0.0003441, whisper_loss=0.08519, over 16881.00 frames. ], tot_loss[loss=0.1189, beats_loss=0.01271, ecapa_loss=0.0003481, whisper_loss=0.1027, over 3892482.57 frames. ], batch size: 69, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:25:13,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.93 vs. limit=22.5 2024-08-09 20:25:16,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=180480.0, ans=0.2 2024-08-09 20:25:42,886 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 20:25:59,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=180780.0, ans=0.0 2024-08-09 20:26:31,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=15.0 2024-08-09 20:26:35,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3600, loss[loss=0.1134, beats_loss=0.01549, ecapa_loss=0.0002594, whisper_loss=0.0953, over 17938.00 frames. ], tot_loss[loss=0.1191, beats_loss=0.01277, ecapa_loss=0.0003449, whisper_loss=0.1029, over 3889204.99 frames. ], batch size: 69, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:26:38,499 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.970e+01 3.508e+01 4.140e+01 6.583e+01, threshold=7.015e+01, percent-clipped=0.0 2024-08-09 20:27:07,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=181180.0, ans=0.0 2024-08-09 20:27:09,542 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 17 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-09 20:27:44,833 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-09 20:27:53,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=181380.0, ans=0.0 2024-08-09 20:27:53,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-08-09 20:27:55,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=181480.0, ans=0.125 2024-08-09 20:27:56,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3650, loss[loss=0.1113, beats_loss=0.01354, ecapa_loss=0.0003309, whisper_loss=0.09447, over 22111.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01278, ecapa_loss=0.0003446, whisper_loss=0.1023, over 3862684.71 frames. ], batch size: 91, lr: 2.73e-02, grad_scale: 16384.0 2024-08-09 20:28:03,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-09 20:28:04,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=181480.0, ans=0.125 2024-08-09 20:28:27,712 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 20:28:40,941 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-09 20:28:48,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2024-08-09 20:29:05,967 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 20:29:19,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3700, loss[loss=0.1177, beats_loss=0.01641, ecapa_loss=0.0003058, whisper_loss=0.09825, over 20798.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01278, ecapa_loss=0.0003449, whisper_loss=0.1023, over 3872404.20 frames. ], batch size: 84, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:29:22,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.937e+01 3.354e+01 4.017e+01 7.791e+01, threshold=6.707e+01, percent-clipped=1.0 2024-08-09 20:29:45,373 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-09 20:30:00,971 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 20:30:11,315 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-09 20:30:36,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=182380.0, ans=0.1 2024-08-09 20:30:39,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3750, loss[loss=0.1108, beats_loss=0.01417, ecapa_loss=0.0003215, whisper_loss=0.09339, over 22684.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01282, ecapa_loss=0.0003439, whisper_loss=0.1022, over 3867503.95 frames. ], batch size: 90, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:30:39,507 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 20:31:06,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.51 vs. limit=15.0 2024-08-09 20:31:21,262 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-09 20:31:29,802 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 20:31:40,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=182780.0, ans=0.125 2024-08-09 20:31:42,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=29.24 vs. limit=22.5 2024-08-09 20:31:45,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=182880.0, ans=0.0 2024-08-09 20:31:49,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=182880.0, ans=0.2 2024-08-09 20:31:59,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3800, loss[loss=0.1029, beats_loss=0.01535, ecapa_loss=0.0003784, whisper_loss=0.0838, over 22245.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01295, ecapa_loss=0.0003469, whisper_loss=0.1017, over 3868841.37 frames. ], batch size: 93, lr: 2.72e-02, grad_scale: 16384.0 2024-08-09 20:32:00,835 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 20:32:01,757 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.977e+01 3.395e+01 3.964e+01 6.825e+01, threshold=6.789e+01, percent-clipped=1.0 2024-08-09 20:32:25,690 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-09 20:32:40,079 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-09 20:32:40,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.96 vs. limit=6.0 2024-08-09 20:32:45,115 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 20:32:49,260 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-09 20:33:14,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183480.0, ans=0.1 2024-08-09 20:33:16,045 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3850, loss[loss=0.1356, beats_loss=0.0132, ecapa_loss=0.0003315, whisper_loss=0.1191, over 22290.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01295, ecapa_loss=0.0003468, whisper_loss=0.1019, over 3868985.36 frames. ], batch size: 88, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:33:33,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.98 vs. limit=22.5 2024-08-09 20:33:34,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=183580.0, ans=0.125 2024-08-09 20:34:13,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=183780.0, ans=0.125 2024-08-09 20:34:35,206 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3900, loss[loss=0.1281, beats_loss=0.01227, ecapa_loss=0.0003806, whisper_loss=0.1121, over 21913.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.013, ecapa_loss=0.0003503, whisper_loss=0.1016, over 3892032.58 frames. ], batch size: 92, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:34:38,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.932e+01 3.278e+01 3.846e+01 7.989e+01, threshold=6.556e+01, percent-clipped=2.0 2024-08-09 20:35:00,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184080.0, ans=0.1 2024-08-09 20:35:03,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-09 20:35:06,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2024-08-09 20:35:16,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=184180.0, ans=0.0 2024-08-09 20:35:24,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=184280.0, ans=0.0 2024-08-09 20:35:28,496 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-09 20:35:56,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 3950, loss[loss=0.1137, beats_loss=0.01198, ecapa_loss=0.0004156, whisper_loss=0.09757, over 14483.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01294, ecapa_loss=0.0003503, whisper_loss=0.1022, over 3913035.69 frames. ], batch size: 61, lr: 2.71e-02, grad_scale: 16384.0 2024-08-09 20:36:04,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=184480.0, ans=0.125 2024-08-09 20:36:13,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=12.0 2024-08-09 20:36:18,116 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-09 20:36:42,400 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 20:36:44,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=184780.0, ans=0.0 2024-08-09 20:36:57,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=184880.0, ans=0.125 2024-08-09 20:37:08,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2024-08-09 20:37:14,770 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4000, loss[loss=0.09298, beats_loss=0.01477, ecapa_loss=0.0003762, whisper_loss=0.07444, over 17946.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01284, ecapa_loss=0.0003514, whisper_loss=0.1023, over 3881295.06 frames. ], batch size: 76, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:37:17,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+01 2.965e+01 3.379e+01 3.827e+01 6.548e+01, threshold=6.758e+01, percent-clipped=0.0 2024-08-09 20:37:47,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=185180.0, ans=0.0 2024-08-09 20:37:56,050 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 20:38:04,877 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 20:38:10,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=185280.0, ans=0.125 2024-08-09 20:38:15,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=185380.0, ans=0.125 2024-08-09 20:38:18,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=185380.0, ans=0.125 2024-08-09 20:38:20,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=185380.0, ans=0.0 2024-08-09 20:38:30,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4050, loss[loss=0.1213, beats_loss=0.01287, ecapa_loss=0.0003303, whisper_loss=0.1051, over 23290.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01278, ecapa_loss=0.0003535, whisper_loss=0.1022, over 3885987.98 frames. ], batch size: 93, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:38:36,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=185480.0, ans=0.125 2024-08-09 20:38:42,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-09 20:38:55,172 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-09 20:38:56,636 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 20:38:58,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=185680.0, ans=0.2 2024-08-09 20:39:12,662 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 20:39:20,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=185780.0, ans=0.125 2024-08-09 20:39:20,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=185780.0, ans=0.125 2024-08-09 20:39:26,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2024-08-09 20:39:39,474 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4100, loss[loss=0.1265, beats_loss=0.01202, ecapa_loss=0.0004585, whisper_loss=0.1099, over 21991.00 frames. ], tot_loss[loss=0.1188, beats_loss=0.01273, ecapa_loss=0.0003516, whisper_loss=0.1025, over 3897687.40 frames. ], batch size: 92, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:39:42,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.357e+01 3.015e+01 3.336e+01 4.132e+01 1.372e+02, threshold=6.672e+01, percent-clipped=1.0 2024-08-09 20:39:48,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=185980.0, ans=0.0 2024-08-09 20:40:13,266 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 20:40:18,279 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-09 20:40:26,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=186280.0, ans=0.2 2024-08-09 20:40:34,239 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 20:40:45,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2024-08-09 20:40:45,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4150, loss[loss=0.1524, beats_loss=0.01013, ecapa_loss=0.0003391, whisper_loss=0.1389, over 16049.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01277, ecapa_loss=0.000351, whisper_loss=0.102, over 3895920.69 frames. ], batch size: 57, lr: 2.70e-02, grad_scale: 16384.0 2024-08-09 20:40:47,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186480.0, ans=0.1 2024-08-09 20:40:56,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=186480.0, ans=0.125 2024-08-09 20:41:08,948 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 20:41:15,625 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-09 20:41:22,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186680.0, ans=0.1 2024-08-09 20:41:28,923 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-09 20:41:38,306 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 20:41:42,136 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-09 20:41:44,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=186880.0, ans=0.2 2024-08-09 20:41:45,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.93 vs. limit=10.0 2024-08-09 20:41:52,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4200, loss[loss=0.115, beats_loss=0.01387, ecapa_loss=0.0002834, whisper_loss=0.09825, over 22799.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01275, ecapa_loss=0.0003502, whisper_loss=0.1021, over 3914497.91 frames. ], batch size: 89, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:41:54,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.958e+01 3.347e+01 3.898e+01 6.800e+01, threshold=6.694e+01, percent-clipped=1.0 2024-08-09 20:42:00,568 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-09 20:42:22,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=187180.0, ans=0.125 2024-08-09 20:42:38,248 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 20:42:45,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=187380.0, ans=0.125 2024-08-09 20:42:49,029 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 20:42:50,339 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 20:42:58,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4250, loss[loss=0.1123, beats_loss=0.01339, ecapa_loss=0.0003253, whisper_loss=0.09567, over 18541.00 frames. ], tot_loss[loss=0.1187, beats_loss=0.01268, ecapa_loss=0.000348, whisper_loss=0.1026, over 3933734.15 frames. ], batch size: 74, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:43:18,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=187580.0, ans=0.95 2024-08-09 20:43:30,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=15.0 2024-08-09 20:43:41,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-09 20:43:55,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=187880.0, ans=0.125 2024-08-09 20:43:56,228 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-09 20:43:56,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-08-09 20:44:00,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=187880.0, ans=0.125 2024-08-09 20:44:03,839 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4300, loss[loss=0.1364, beats_loss=0.01253, ecapa_loss=0.0003582, whisper_loss=0.1203, over 22509.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01267, ecapa_loss=0.0003466, whisper_loss=0.1024, over 3929761.17 frames. ], batch size: 86, lr: 2.69e-02, grad_scale: 16384.0 2024-08-09 20:44:06,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.942e+01 3.508e+01 4.302e+01 6.032e+01, threshold=7.016e+01, percent-clipped=0.0 2024-08-09 20:44:20,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=188080.0, ans=0.125 2024-08-09 20:44:35,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=188180.0, ans=0.125 2024-08-09 20:44:37,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-09 20:44:50,202 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 20:45:09,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4350, loss[loss=0.1235, beats_loss=0.008805, ecapa_loss=0.0003533, whisper_loss=0.1111, over 19005.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01268, ecapa_loss=0.0003444, whisper_loss=0.1021, over 3907059.48 frames. ], batch size: 73, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:45:18,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=188480.0, ans=0.125 2024-08-09 20:45:36,891 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 20:45:42,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=188680.0, ans=0.0 2024-08-09 20:45:45,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-09 20:45:49,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-09 20:45:59,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2024-08-09 20:46:19,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-09 20:46:20,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4400, loss[loss=0.1299, beats_loss=0.01106, ecapa_loss=0.0003553, whisper_loss=0.1153, over 18913.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.0127, ecapa_loss=0.0003432, whisper_loss=0.1023, over 3895190.83 frames. ], batch size: 74, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:46:23,478 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.890e+01 3.311e+01 3.807e+01 6.108e+01, threshold=6.622e+01, percent-clipped=0.0 2024-08-09 20:46:34,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=189080.0, ans=0.125 2024-08-09 20:46:44,508 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 20:46:47,053 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 20:47:02,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2024-08-09 20:47:14,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=189280.0, ans=0.125 2024-08-09 20:47:17,532 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-09 20:47:22,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=189380.0, ans=0.1 2024-08-09 20:47:24,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=189380.0, ans=0.125 2024-08-09 20:47:24,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=189380.0, ans=15.0 2024-08-09 20:47:29,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189380.0, ans=0.1 2024-08-09 20:47:34,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=189380.0, ans=0.2 2024-08-09 20:47:38,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4450, loss[loss=0.1171, beats_loss=0.01418, ecapa_loss=0.0003202, whisper_loss=0.09976, over 22608.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01277, ecapa_loss=0.0003429, whisper_loss=0.102, over 3899097.66 frames. ], batch size: 93, lr: 2.68e-02, grad_scale: 16384.0 2024-08-09 20:47:39,866 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-09 20:47:48,400 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-09 20:47:57,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=189580.0, ans=0.0 2024-08-09 20:48:04,980 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 20:48:23,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=189680.0, ans=0.125 2024-08-09 20:48:23,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2024-08-09 20:48:32,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=189780.0, ans=0.0 2024-08-09 20:48:35,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=189780.0, ans=0.1 2024-08-09 20:48:37,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=189780.0, ans=0.125 2024-08-09 20:48:48,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=189880.0, ans=0.035 2024-08-09 20:48:51,822 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.188e-01 2024-08-09 20:48:52,707 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-09 20:48:52,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=189880.0, ans=0.2 2024-08-09 20:49:02,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4500, loss[loss=0.1353, beats_loss=0.01162, ecapa_loss=0.0002884, whisper_loss=0.1208, over 24701.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01287, ecapa_loss=0.0003415, whisper_loss=0.1019, over 3911638.92 frames. ], batch size: 92, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:49:06,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.955e+01 3.431e+01 3.879e+01 5.998e+01, threshold=6.863e+01, percent-clipped=0.0 2024-08-09 20:49:19,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=190080.0, ans=0.125 2024-08-09 20:49:29,020 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-09 20:49:38,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=190180.0, ans=0.125 2024-08-09 20:49:44,046 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 10 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 20:49:51,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=190280.0, ans=0.125 2024-08-09 20:50:06,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=190380.0, ans=0.125 2024-08-09 20:50:22,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=190380.0, ans=0.125 2024-08-09 20:50:24,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4550, loss[loss=0.1121, beats_loss=0.01268, ecapa_loss=0.0004384, whisper_loss=0.09502, over 16070.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.0129, ecapa_loss=0.0003418, whisper_loss=0.1015, over 3883496.60 frames. ], batch size: 64, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:50:29,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=190480.0, ans=0.5 2024-08-09 20:50:45,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=190580.0, ans=0.125 2024-08-09 20:50:48,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=190580.0, ans=0.125 2024-08-09 20:50:51,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2024-08-09 20:51:24,384 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-09 20:51:27,837 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 20:51:29,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-08-09 20:51:45,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4600, loss[loss=0.1199, beats_loss=0.0135, ecapa_loss=0.0003035, whisper_loss=0.1034, over 22488.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01285, ecapa_loss=0.0003422, whisper_loss=0.1012, over 3914225.46 frames. ], batch size: 89, lr: 2.67e-02, grad_scale: 16384.0 2024-08-09 20:51:48,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 2.933e+01 3.481e+01 4.250e+01 8.633e+01, threshold=6.961e+01, percent-clipped=3.0 2024-08-09 20:52:05,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2024-08-09 20:52:17,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2024-08-09 20:53:03,590 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 20:53:05,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4650, loss[loss=0.1269, beats_loss=0.01202, ecapa_loss=0.0003915, whisper_loss=0.111, over 22448.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01308, ecapa_loss=0.0003416, whisper_loss=0.09928, over 3910688.94 frames. ], batch size: 93, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:53:05,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=191480.0, ans=0.125 2024-08-09 20:53:11,612 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-09 20:53:13,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.17 vs. limit=22.5 2024-08-09 20:53:26,441 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 20:53:47,394 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-09 20:54:17,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=191880.0, ans=0.0 2024-08-09 20:54:25,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4700, loss[loss=0.1002, beats_loss=0.01264, ecapa_loss=0.00032, whisper_loss=0.08433, over 13903.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01308, ecapa_loss=0.0003423, whisper_loss=0.09883, over 3913067.18 frames. ], batch size: 54, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:54:28,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 2.995e+01 3.606e+01 4.056e+01 7.854e+01, threshold=7.212e+01, percent-clipped=1.0 2024-08-09 20:54:31,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=191980.0, ans=0.125 2024-08-09 20:54:35,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=12.0 2024-08-09 20:54:41,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=192080.0, ans=0.2 2024-08-09 20:54:43,131 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-09 20:54:55,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=192180.0, ans=0.5 2024-08-09 20:55:13,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192280.0, ans=0.1 2024-08-09 20:55:20,184 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 20:55:25,987 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-09 20:55:34,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=192380.0, ans=0.2 2024-08-09 20:55:45,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-08-09 20:55:45,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4750, loss[loss=0.1057, beats_loss=0.01314, ecapa_loss=0.0004185, whisper_loss=0.08837, over 18902.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01304, ecapa_loss=0.0003429, whisper_loss=0.0992, over 3913301.04 frames. ], batch size: 82, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:55:50,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2024-08-09 20:55:57,100 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-09 20:56:21,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=192680.0, ans=0.0 2024-08-09 20:56:32,293 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 20:56:35,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=192780.0, ans=0.2 2024-08-09 20:57:04,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4800, loss[loss=0.1003, beats_loss=0.01582, ecapa_loss=0.0002718, whisper_loss=0.0818, over 18745.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01311, ecapa_loss=0.0003409, whisper_loss=0.09954, over 3921303.71 frames. ], batch size: 74, lr: 2.66e-02, grad_scale: 16384.0 2024-08-09 20:57:07,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 3.258e+01 3.599e+01 4.060e+01 6.614e+01, threshold=7.198e+01, percent-clipped=0.0 2024-08-09 20:57:07,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=192980.0, ans=0.0 2024-08-09 20:57:23,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193080.0, ans=0.1 2024-08-09 20:57:26,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2024-08-09 20:57:33,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=193180.0, ans=0.125 2024-08-09 20:57:40,885 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 20:57:42,933 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 28 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-09 20:57:45,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=193180.0, ans=0.125 2024-08-09 20:57:45,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193180.0, ans=0.1 2024-08-09 20:58:09,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=193380.0, ans=0.0 2024-08-09 20:58:17,901 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4850, loss[loss=0.1078, beats_loss=0.01261, ecapa_loss=0.0003998, whisper_loss=0.09121, over 18571.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.0131, ecapa_loss=0.0003404, whisper_loss=0.1006, over 3940523.62 frames. ], batch size: 80, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:58:27,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=193480.0, ans=0.0 2024-08-09 20:58:39,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=193580.0, ans=0.125 2024-08-09 20:58:42,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2024-08-09 20:58:44,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=193680.0, ans=0.0 2024-08-09 20:58:45,826 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 20:58:47,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=193680.0, ans=0.1 2024-08-09 20:58:56,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.66 vs. limit=15.0 2024-08-09 20:59:00,337 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 20:59:15,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=15.0 2024-08-09 20:59:22,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=193880.0, ans=0.05 2024-08-09 20:59:27,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4900, loss[loss=0.1059, beats_loss=0.0144, ecapa_loss=0.0004128, whisper_loss=0.0874, over 19061.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01301, ecapa_loss=0.0003413, whisper_loss=0.1011, over 3904397.92 frames. ], batch size: 83, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 20:59:27,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=193980.0, ans=0.125 2024-08-09 20:59:30,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2024-08-09 20:59:30,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.990e+01 3.252e+01 3.746e+01 5.696e+01, threshold=6.504e+01, percent-clipped=0.0 2024-08-09 20:59:55,046 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-09 20:59:56,014 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 21:00:11,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=194280.0, ans=0.07 2024-08-09 21:00:16,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=15.0 2024-08-09 21:00:18,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=194280.0, ans=0.125 2024-08-09 21:00:25,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194380.0, ans=0.1 2024-08-09 21:00:36,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 4950, loss[loss=0.1139, beats_loss=0.01218, ecapa_loss=0.0003655, whisper_loss=0.09805, over 14656.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01304, ecapa_loss=0.0003408, whisper_loss=0.1011, over 3915330.34 frames. ], batch size: 60, lr: 2.65e-02, grad_scale: 16384.0 2024-08-09 21:01:22,588 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 21:01:24,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=194780.0, ans=0.125 2024-08-09 21:01:25,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-08-09 21:01:27,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=194780.0, ans=0.07 2024-08-09 21:01:38,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=194880.0, ans=0.0 2024-08-09 21:01:43,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5000, loss[loss=0.116, beats_loss=0.01378, ecapa_loss=0.0003759, whisper_loss=0.09842, over 22219.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01287, ecapa_loss=0.0003434, whisper_loss=0.1019, over 3903720.55 frames. ], batch size: 90, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:01:46,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.882e+01 3.259e+01 3.861e+01 5.497e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-09 21:01:50,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194980.0, ans=0.1 2024-08-09 21:01:54,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=194980.0, ans=0.125 2024-08-09 21:01:55,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=194980.0, ans=0.0 2024-08-09 21:02:07,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=195080.0, ans=0.1 2024-08-09 21:02:15,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=195180.0, ans=0.1 2024-08-09 21:02:19,175 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 21:02:27,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=195280.0, ans=0.125 2024-08-09 21:02:38,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=195380.0, ans=0.0 2024-08-09 21:02:47,667 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-09 21:02:50,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=195480.0, ans=0.0 2024-08-09 21:02:51,136 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5050, loss[loss=0.125, beats_loss=0.01221, ecapa_loss=0.0003399, whisper_loss=0.1094, over 17302.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01287, ecapa_loss=0.0003433, whisper_loss=0.1015, over 3856861.25 frames. ], batch size: 66, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:02:51,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=195480.0, ans=0.0 2024-08-09 21:02:57,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=195480.0, ans=0.0 2024-08-09 21:03:17,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=195680.0, ans=0.5 2024-08-09 21:03:18,989 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 21:03:23,946 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 21:03:24,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=195680.0, ans=0.125 2024-08-09 21:03:25,461 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-09 21:03:27,285 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:03:30,793 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 21:03:31,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-09 21:03:34,089 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=22.5 2024-08-09 21:03:47,069 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 21:03:57,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5100, loss[loss=0.1228, beats_loss=0.01447, ecapa_loss=0.0003125, whisper_loss=0.1052, over 20156.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01289, ecapa_loss=0.0003438, whisper_loss=0.1017, over 3877994.93 frames. ], batch size: 81, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:03:59,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.875e+01 3.306e+01 3.993e+01 6.485e+01, threshold=6.613e+01, percent-clipped=0.0 2024-08-09 21:04:00,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2024-08-09 21:04:14,039 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 21:04:14,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=196080.0, ans=0.1 2024-08-09 21:04:16,852 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 21:04:20,772 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-09 21:04:23,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=196180.0, ans=0.125 2024-08-09 21:04:27,938 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 21:04:52,442 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-09 21:04:58,979 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 21:05:00,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=196380.0, ans=0.125 2024-08-09 21:05:05,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5150, loss[loss=0.1619, beats_loss=0.009343, ecapa_loss=0.0003593, whisper_loss=0.1489, over 22967.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01277, ecapa_loss=0.0003408, whisper_loss=0.1021, over 3873597.11 frames. ], batch size: 88, lr: 2.64e-02, grad_scale: 16384.0 2024-08-09 21:05:11,466 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 21:05:25,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=196580.0, ans=0.125 2024-08-09 21:05:31,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.45 vs. limit=10.0 2024-08-09 21:05:46,731 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-09 21:05:54,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196780.0, ans=0.1 2024-08-09 21:05:57,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=196780.0, ans=15.0 2024-08-09 21:05:59,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196880.0, ans=0.1 2024-08-09 21:06:02,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=196880.0, ans=15.0 2024-08-09 21:06:13,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5200, loss[loss=0.1274, beats_loss=0.01036, ecapa_loss=0.0003824, whisper_loss=0.1133, over 18062.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01279, ecapa_loss=0.0003381, whisper_loss=0.1018, over 3889662.98 frames. ], batch size: 71, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:06:16,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.861e+01 3.315e+01 3.921e+01 5.764e+01, threshold=6.630e+01, percent-clipped=0.0 2024-08-09 21:06:17,757 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-09 21:06:50,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=197180.0, ans=0.125 2024-08-09 21:07:04,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=197280.0, ans=0.0 2024-08-09 21:07:05,418 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 21:07:15,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=197380.0, ans=0.2 2024-08-09 21:07:19,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=197380.0, ans=0.2 2024-08-09 21:07:21,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5250, loss[loss=0.08449, beats_loss=0.01471, ecapa_loss=0.000315, whisper_loss=0.06663, over 14499.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01278, ecapa_loss=0.0003389, whisper_loss=0.1007, over 3856379.18 frames. ], batch size: 57, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:07:21,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=197480.0, ans=0.0 2024-08-09 21:07:24,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=197480.0, ans=0.125 2024-08-09 21:07:33,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=197580.0, ans=0.2 2024-08-09 21:07:42,183 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:07:46,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=197580.0, ans=0.125 2024-08-09 21:07:47,436 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 21:07:53,544 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.182e-01 2024-08-09 21:08:00,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=12.0 2024-08-09 21:08:10,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=197780.0, ans=0.2 2024-08-09 21:08:21,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=197880.0, ans=0.125 2024-08-09 21:08:29,319 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-09 21:08:30,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5300, loss[loss=0.123, beats_loss=0.01345, ecapa_loss=0.0003233, whisper_loss=0.1063, over 21873.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.0128, ecapa_loss=0.0003396, whisper_loss=0.1004, over 3870505.07 frames. ], batch size: 89, lr: 2.63e-02, grad_scale: 16384.0 2024-08-09 21:08:33,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.918e+01 3.459e+01 4.148e+01 6.900e+01, threshold=6.919e+01, percent-clipped=2.0 2024-08-09 21:08:33,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=197980.0, ans=0.125 2024-08-09 21:08:44,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198080.0, ans=0.1 2024-08-09 21:09:23,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=198280.0, ans=0.07 2024-08-09 21:09:30,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=198380.0, ans=0.0 2024-08-09 21:09:40,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5350, loss[loss=0.1408, beats_loss=0.01192, ecapa_loss=0.0003139, whisper_loss=0.1258, over 16665.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01274, ecapa_loss=0.0003383, whisper_loss=0.1007, over 3871217.20 frames. ], batch size: 64, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:09:53,158 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-09 21:10:15,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=198680.0, ans=0.0 2024-08-09 21:10:24,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=198780.0, ans=0.0 2024-08-09 21:10:30,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=198780.0, ans=0.035 2024-08-09 21:10:30,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=198780.0, ans=0.0 2024-08-09 21:10:41,599 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 21:10:52,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5400, loss[loss=0.1272, beats_loss=0.0126, ecapa_loss=0.0003132, whisper_loss=0.1115, over 23126.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01274, ecapa_loss=0.0003355, whisper_loss=0.1009, over 3886399.19 frames. ], batch size: 90, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:10:55,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.905e+01 3.438e+01 3.898e+01 7.093e+01, threshold=6.876e+01, percent-clipped=1.0 2024-08-09 21:10:57,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=198980.0, ans=0.125 2024-08-09 21:11:01,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=198980.0, ans=0.125 2024-08-09 21:11:06,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=199080.0, ans=10.0 2024-08-09 21:11:14,087 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 21:11:26,268 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-09 21:11:33,704 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-09 21:11:38,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=199280.0, ans=0.125 2024-08-09 21:11:39,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=199280.0, ans=0.125 2024-08-09 21:12:06,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5450, loss[loss=0.1248, beats_loss=0.01257, ecapa_loss=0.0002811, whisper_loss=0.1094, over 20794.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01278, ecapa_loss=0.0003357, whisper_loss=0.1012, over 3876159.86 frames. ], batch size: 77, lr: 2.62e-02, grad_scale: 16384.0 2024-08-09 21:12:16,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.89 vs. limit=15.0 2024-08-09 21:12:18,503 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 21:12:29,341 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-09 21:12:31,027 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.054e-01 2024-08-09 21:12:44,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=15.0 2024-08-09 21:13:01,512 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 21:13:17,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=199980.0, ans=0.0 2024-08-09 21:13:18,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5500, loss[loss=0.1231, beats_loss=0.01208, ecapa_loss=0.0003226, whisper_loss=0.1078, over 13918.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01282, ecapa_loss=0.0003344, whisper_loss=0.1014, over 3868582.48 frames. ], batch size: 54, lr: 2.61e-02, grad_scale: 16384.0 2024-08-09 21:13:19,658 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-20000.pt 2024-08-09 21:13:23,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 3.012e+01 3.355e+01 3.811e+01 5.286e+01, threshold=6.711e+01, percent-clipped=0.0 2024-08-09 21:13:24,724 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 21:14:01,227 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 21:14:23,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=200380.0, ans=0.125 2024-08-09 21:14:33,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5550, loss[loss=0.1077, beats_loss=0.01567, ecapa_loss=0.0002623, whisper_loss=0.08944, over 17750.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01282, ecapa_loss=0.0003356, whisper_loss=0.101, over 3906105.09 frames. ], batch size: 69, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:14:43,543 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-09 21:14:45,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=200480.0, ans=0.125 2024-08-09 21:14:45,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-09 21:14:50,714 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 21:14:50,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=200580.0, ans=0.0 2024-08-09 21:14:52,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-08-09 21:14:57,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=200580.0, ans=0.125 2024-08-09 21:14:58,825 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-09 21:15:05,407 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 21:15:05,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=200680.0, ans=0.125 2024-08-09 21:15:10,447 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-09 21:15:39,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=200880.0, ans=0.1 2024-08-09 21:15:46,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5600, loss[loss=0.1068, beats_loss=0.01261, ecapa_loss=0.0003661, whisper_loss=0.09053, over 20559.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01275, ecapa_loss=0.0003388, whisper_loss=0.1013, over 3912011.25 frames. ], batch size: 84, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:15:48,362 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-09 21:15:49,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 3.019e+01 3.603e+01 4.139e+01 2.249e+02, threshold=7.206e+01, percent-clipped=7.0 2024-08-09 21:15:56,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=200980.0, ans=0.125 2024-08-09 21:16:09,060 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 21:16:13,295 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 21:16:14,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=201180.0, ans=0.0 2024-08-09 21:16:15,886 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-09 21:16:19,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=201180.0, ans=0.2 2024-08-09 21:16:28,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=201280.0, ans=0.125 2024-08-09 21:16:33,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201280.0, ans=0.1 2024-08-09 21:16:34,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2024-08-09 21:16:44,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=201380.0, ans=0.125 2024-08-09 21:16:56,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5650, loss[loss=0.1223, beats_loss=0.01367, ecapa_loss=0.00034, whisper_loss=0.1052, over 22769.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.0127, ecapa_loss=0.0003399, whisper_loss=0.1013, over 3922785.99 frames. ], batch size: 93, lr: 2.61e-02, grad_scale: 32768.0 2024-08-09 21:17:09,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=201580.0, ans=0.0 2024-08-09 21:17:12,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=201580.0, ans=0.125 2024-08-09 21:17:18,534 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 21:17:22,507 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-09 21:17:25,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=201680.0, ans=0.125 2024-08-09 21:17:26,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-08-09 21:17:28,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201680.0, ans=0.1 2024-08-09 21:17:29,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=201680.0, ans=0.2 2024-08-09 21:17:31,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=201680.0, ans=0.2 2024-08-09 21:17:34,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=201680.0, ans=0.125 2024-08-09 21:17:49,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=201880.0, ans=0.125 2024-08-09 21:18:03,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5700, loss[loss=0.1022, beats_loss=0.01316, ecapa_loss=0.0003683, whisper_loss=0.08538, over 21322.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01277, ecapa_loss=0.0003391, whisper_loss=0.1009, over 3924505.40 frames. ], batch size: 91, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:18:06,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+01 3.095e+01 3.448e+01 4.225e+01 7.062e+01, threshold=6.897e+01, percent-clipped=0.0 2024-08-09 21:18:33,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=202180.0, ans=0.125 2024-08-09 21:18:34,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=202180.0, ans=0.2 2024-08-09 21:18:37,974 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 21:18:46,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=202280.0, ans=0.2 2024-08-09 21:18:49,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2024-08-09 21:18:57,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=12.0 2024-08-09 21:19:08,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202380.0, ans=0.1 2024-08-09 21:19:10,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5750, loss[loss=0.1107, beats_loss=0.01502, ecapa_loss=0.0003219, whisper_loss=0.09241, over 19086.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01281, ecapa_loss=0.0003393, whisper_loss=0.1007, over 3894045.45 frames. ], batch size: 77, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:19:21,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202480.0, ans=0.1 2024-08-09 21:19:25,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2024-08-09 21:19:36,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=202680.0, ans=0.125 2024-08-09 21:19:39,450 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-09 21:19:43,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202680.0, ans=0.1 2024-08-09 21:19:53,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=15.0 2024-08-09 21:19:53,901 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-09 21:19:57,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=202780.0, ans=0.1 2024-08-09 21:20:16,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=202880.0, ans=0.125 2024-08-09 21:20:17,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5800, loss[loss=0.08129, beats_loss=0.0146, ecapa_loss=0.0002617, whisper_loss=0.06406, over 14892.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01276, ecapa_loss=0.0003413, whisper_loss=0.1012, over 3882860.72 frames. ], batch size: 60, lr: 2.60e-02, grad_scale: 32768.0 2024-08-09 21:20:20,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 3.100e+01 3.407e+01 4.370e+01 6.410e+01, threshold=6.814e+01, percent-clipped=0.0 2024-08-09 21:20:23,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=15.0 2024-08-09 21:20:28,970 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 21:20:34,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=203080.0, ans=0.125 2024-08-09 21:20:37,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=15.0 2024-08-09 21:20:44,725 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-09 21:20:49,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=203180.0, ans=0.07 2024-08-09 21:20:59,368 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-09 21:21:03,314 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 21:21:09,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=203280.0, ans=0.125 2024-08-09 21:21:09,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=203280.0, ans=0.0 2024-08-09 21:21:12,675 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 21:21:14,092 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 26 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 21:21:17,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-09 21:21:23,628 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-09 21:21:24,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5850, loss[loss=0.1255, beats_loss=0.01177, ecapa_loss=0.000293, whisper_loss=0.1109, over 22617.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01284, ecapa_loss=0.0003403, whisper_loss=0.1011, over 3907910.44 frames. ], batch size: 85, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:21:29,127 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 21:21:59,253 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 21:21:59,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=203680.0, ans=0.0 2024-08-09 21:21:59,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2024-08-09 21:22:13,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=203780.0, ans=10.0 2024-08-09 21:22:15,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=203780.0, ans=0.2 2024-08-09 21:22:31,534 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5900, loss[loss=0.1364, beats_loss=0.01191, ecapa_loss=0.0003523, whisper_loss=0.121, over 20948.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01286, ecapa_loss=0.0003379, whisper_loss=0.1007, over 3894230.96 frames. ], batch size: 84, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:22:32,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=12.0 2024-08-09 21:22:34,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.068e+01 3.370e+01 4.019e+01 7.434e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-09 21:22:49,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-08-09 21:22:50,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=204080.0, ans=0.2 2024-08-09 21:22:58,498 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-09 21:23:01,153 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 21:23:01,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=204180.0, ans=0.125 2024-08-09 21:23:02,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2024-08-09 21:23:04,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=204180.0, ans=0.025 2024-08-09 21:23:19,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=204280.0, ans=0.0 2024-08-09 21:23:25,544 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 21:23:32,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.60 vs. limit=22.5 2024-08-09 21:23:39,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 5950, loss[loss=0.1096, beats_loss=0.0135, ecapa_loss=0.0003546, whisper_loss=0.09255, over 21404.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01291, ecapa_loss=0.0003371, whisper_loss=0.1002, over 3913996.76 frames. ], batch size: 88, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:24:19,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=204780.0, ans=0.125 2024-08-09 21:24:24,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=204780.0, ans=0.125 2024-08-09 21:24:30,307 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-09 21:24:44,469 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6000, loss[loss=0.1143, beats_loss=0.01221, ecapa_loss=0.0003766, whisper_loss=0.09835, over 21181.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01277, ecapa_loss=0.0003363, whisper_loss=0.101, over 3904437.90 frames. ], batch size: 89, lr: 2.59e-02, grad_scale: 32768.0 2024-08-09 21:24:44,471 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 21:25:02,317 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3317, 3.7664, 3.7970, 4.1652], device='cuda:0') 2024-08-09 21:25:26,033 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on ASR_libri: loss=0.2831, beats_loss=0, ecapa_loss=0.0009654, whisper_loss=0.2734, over 922467.00 frames. 2024-08-09 21:25:44,713 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on SV_voxceleb1: loss=0.008561, beats_loss=0, ecapa_loss=0.0008561, whisper_loss=0, over 939242.00 frames. 2024-08-09 21:26:12,781 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.4644, 1.2948, 1.2224, 1.4819, 0.8369, 1.4123, 1.4910, 0.7921], device='cuda:0') 2024-08-09 21:27:41,195 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on AT_audioset: loss=0.03036, beats_loss=0.03036, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 21:27:41,200 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 21:27:43,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.831e+01 3.333e+01 3.565e+01 5.881e+01, threshold=6.666e+01, percent-clipped=0.0 2024-08-09 21:27:45,325 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-09 21:27:49,436 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 21:27:52,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-08-09 21:27:57,272 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 26 from Vox, 13 fro AS 2024-08-09 21:28:01,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-08-09 21:28:17,646 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 21:28:24,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=205280.0, ans=0.0 2024-08-09 21:28:26,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=205280.0, ans=0.125 2024-08-09 21:28:48,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6050, loss[loss=0.137, beats_loss=0.009459, ecapa_loss=0.000419, whisper_loss=0.1234, over 18146.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01263, ecapa_loss=0.0003362, whisper_loss=0.1014, over 3859170.54 frames. ], batch size: 73, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:28:56,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=205480.0, ans=0.125 2024-08-09 21:28:59,423 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 38 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 21:29:15,020 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-09 21:29:20,122 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-09 21:29:29,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=205780.0, ans=0.125 2024-08-09 21:29:30,883 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 21:29:39,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=205780.0, ans=0.04949747468305833 2024-08-09 21:29:46,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=205880.0, ans=0.125 2024-08-09 21:29:54,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6100, loss[loss=0.1062, beats_loss=0.01353, ecapa_loss=0.0003382, whisper_loss=0.08929, over 21680.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01269, ecapa_loss=0.000336, whisper_loss=0.1014, over 3906515.04 frames. ], batch size: 90, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:29:57,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.058e+01 3.470e+01 4.090e+01 8.250e+01, threshold=6.939e+01, percent-clipped=1.0 2024-08-09 21:30:06,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205980.0, ans=0.1 2024-08-09 21:30:19,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2024-08-09 21:30:30,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=206180.0, ans=0.125 2024-08-09 21:30:36,728 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 32 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-09 21:30:40,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=206280.0, ans=0.0 2024-08-09 21:30:54,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=206380.0, ans=0.125 2024-08-09 21:31:03,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6150, loss[loss=0.08686, beats_loss=0.01495, ecapa_loss=0.000362, whisper_loss=0.06829, over 16639.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01261, ecapa_loss=0.000336, whisper_loss=0.1017, over 3868003.41 frames. ], batch size: 69, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:31:03,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=206480.0, ans=0.125 2024-08-09 21:31:10,616 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-08-09 21:31:12,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=206480.0, ans=0.125 2024-08-09 21:31:15,036 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 21:31:17,694 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-09 21:31:21,675 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-09 21:31:28,561 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-09 21:31:29,886 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 21:31:31,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=206680.0, ans=0.125 2024-08-09 21:31:50,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=15.0 2024-08-09 21:32:09,391 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-09 21:32:10,597 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6200, loss[loss=0.1489, beats_loss=0.01247, ecapa_loss=0.00035, whisper_loss=0.1329, over 22803.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01269, ecapa_loss=0.0003342, whisper_loss=0.1015, over 3877807.60 frames. ], batch size: 89, lr: 2.58e-02, grad_scale: 32768.0 2024-08-09 21:32:13,176 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 3.042e+01 3.611e+01 4.258e+01 6.640e+01, threshold=7.222e+01, percent-clipped=0.0 2024-08-09 21:32:42,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=207180.0, ans=0.125 2024-08-09 21:32:43,015 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 21:32:43,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207180.0, ans=0.1 2024-08-09 21:33:01,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=207280.0, ans=0.2 2024-08-09 21:33:16,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=207380.0, ans=0.125 2024-08-09 21:33:18,286 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6250, loss[loss=0.1136, beats_loss=0.01166, ecapa_loss=0.0003947, whisper_loss=0.09804, over 14276.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01274, ecapa_loss=0.0003313, whisper_loss=0.1012, over 3891269.17 frames. ], batch size: 55, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:33:23,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207480.0, ans=0.1 2024-08-09 21:33:25,848 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 37 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-09 21:33:35,343 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 21:33:43,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=207580.0, ans=15.0 2024-08-09 21:34:06,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=207780.0, ans=0.0 2024-08-09 21:34:09,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=207780.0, ans=0.0 2024-08-09 21:34:20,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=207880.0, ans=0.2 2024-08-09 21:34:27,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6300, loss[loss=0.1025, beats_loss=0.01332, ecapa_loss=0.00031, whisper_loss=0.08606, over 15477.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01274, ecapa_loss=0.0003304, whisper_loss=0.1012, over 3875198.59 frames. ], batch size: 61, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:34:30,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.893e+01 3.305e+01 3.810e+01 5.470e+01, threshold=6.610e+01, percent-clipped=0.0 2024-08-09 21:35:05,450 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 21:35:10,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=208280.0, ans=0.125 2024-08-09 21:35:11,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=208280.0, ans=0.09899494936611666 2024-08-09 21:35:31,706 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 11 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-09 21:35:35,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6350, loss[loss=0.1316, beats_loss=0.01358, ecapa_loss=0.0003642, whisper_loss=0.1144, over 21820.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01285, ecapa_loss=0.0003322, whisper_loss=0.1008, over 3870635.65 frames. ], batch size: 90, lr: 2.57e-02, grad_scale: 32768.0 2024-08-09 21:35:38,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=208480.0, ans=0.0 2024-08-09 21:35:43,648 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-09 21:35:51,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=208580.0, ans=0.125 2024-08-09 21:35:52,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=208580.0, ans=0.2 2024-08-09 21:35:53,850 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 21:35:54,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=208580.0, ans=0.2 2024-08-09 21:36:22,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208780.0, ans=0.1 2024-08-09 21:36:28,993 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.605e+00 2024-08-09 21:36:32,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=208880.0, ans=0.125 2024-08-09 21:36:34,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208880.0, ans=0.1 2024-08-09 21:36:39,142 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.331e+01 2024-08-09 21:36:44,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6400, loss[loss=0.1041, beats_loss=0.01382, ecapa_loss=0.0002244, whisper_loss=0.08802, over 18721.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01287, ecapa_loss=0.0003319, whisper_loss=0.101, over 3896413.87 frames. ], batch size: 70, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:36:48,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 3.030e+01 3.423e+01 4.041e+01 6.749e+01, threshold=6.846e+01, percent-clipped=1.0 2024-08-09 21:37:02,141 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 21:37:03,666 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 26 from LS+wenet, 8 from Vox, 24 fro AS 2024-08-09 21:37:23,772 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 21:37:33,303 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 21:37:43,481 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 21:37:49,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=209380.0, ans=15.0 2024-08-09 21:37:54,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6450, loss[loss=0.1291, beats_loss=0.01228, ecapa_loss=0.0003917, whisper_loss=0.1129, over 22494.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01289, ecapa_loss=0.0003315, whisper_loss=0.1006, over 3899617.07 frames. ], batch size: 90, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:37:57,613 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 21:37:57,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=209480.0, ans=0.04949747468305833 2024-08-09 21:38:04,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=209480.0, ans=0.125 2024-08-09 21:38:04,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.98 vs. limit=22.5 2024-08-09 21:38:05,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=209480.0, ans=0.2 2024-08-09 21:38:08,966 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-09 21:38:20,166 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-09 21:38:42,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=209780.0, ans=0.1 2024-08-09 21:38:45,401 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-09 21:38:52,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=209880.0, ans=0.0 2024-08-09 21:38:56,817 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-09 21:39:04,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6500, loss[loss=0.1242, beats_loss=0.01293, ecapa_loss=0.0002654, whisper_loss=0.1086, over 16579.00 frames. ], tot_loss[loss=0.118, beats_loss=0.01287, ecapa_loss=0.00033, whisper_loss=0.1018, over 3911823.04 frames. ], batch size: 64, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:39:07,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.878e+01 3.238e+01 3.656e+01 8.439e+01, threshold=6.476e+01, percent-clipped=1.0 2024-08-09 21:39:09,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=209980.0, ans=0.125 2024-08-09 21:39:11,907 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-09 21:39:13,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-08-09 21:39:17,322 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-09 21:39:20,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=210080.0, ans=0.0 2024-08-09 21:39:50,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.54 vs. limit=22.5 2024-08-09 21:39:57,902 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-09 21:39:59,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=210380.0, ans=0.0 2024-08-09 21:40:09,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2024-08-09 21:40:14,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6550, loss[loss=0.142, beats_loss=0.01258, ecapa_loss=0.0002986, whisper_loss=0.1264, over 22762.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01291, ecapa_loss=0.0003291, whisper_loss=0.102, over 3944701.86 frames. ], batch size: 91, lr: 2.56e-02, grad_scale: 32768.0 2024-08-09 21:40:51,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=210680.0, ans=0.125 2024-08-09 21:41:18,672 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 21:41:22,211 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6600, loss[loss=0.151, beats_loss=0.006168, ecapa_loss=0.0003661, whisper_loss=0.1412, over 14802.00 frames. ], tot_loss[loss=0.1186, beats_loss=0.01276, ecapa_loss=0.0003313, whisper_loss=0.1025, over 3971818.38 frames. ], batch size: 53, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:41:24,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.037e+01 3.483e+01 4.077e+01 6.253e+01, threshold=6.966e+01, percent-clipped=0.0 2024-08-09 21:41:31,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=210980.0, ans=0.125 2024-08-09 21:41:32,020 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-09 21:41:41,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=211080.0, ans=0.125 2024-08-09 21:42:15,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2024-08-09 21:42:31,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6650, loss[loss=0.1163, beats_loss=0.01398, ecapa_loss=0.000329, whisper_loss=0.099, over 21855.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01282, ecapa_loss=0.0003296, whisper_loss=0.1024, over 3981797.94 frames. ], batch size: 90, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:42:32,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-09 21:42:35,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=211480.0, ans=0.125 2024-08-09 21:42:44,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=211580.0, ans=0.025 2024-08-09 21:42:48,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.27 vs. limit=10.0 2024-08-09 21:42:49,080 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 21:42:54,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=211580.0, ans=0.125 2024-08-09 21:42:55,587 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-09 21:42:55,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=211580.0, ans=0.95 2024-08-09 21:43:10,365 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-09 21:43:19,734 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.230e+00 2024-08-09 21:43:31,872 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 21:43:38,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6700, loss[loss=0.1396, beats_loss=0.008947, ecapa_loss=0.0004083, whisper_loss=0.1266, over 15843.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.0129, ecapa_loss=0.0003313, whisper_loss=0.1013, over 3967001.60 frames. ], batch size: 61, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:43:41,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.049e+01 3.429e+01 4.303e+01 7.619e+01, threshold=6.858e+01, percent-clipped=1.0 2024-08-09 21:44:06,180 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 21:44:09,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=212180.0, ans=0.2 2024-08-09 21:44:14,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=212180.0, ans=0.125 2024-08-09 21:44:33,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=212380.0, ans=0.2 2024-08-09 21:44:35,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=212380.0, ans=0.125 2024-08-09 21:44:47,797 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6750, loss[loss=0.1437, beats_loss=0.0111, ecapa_loss=0.0003787, whisper_loss=0.1288, over 19134.00 frames. ], tot_loss[loss=0.1183, beats_loss=0.01275, ecapa_loss=0.0003336, whisper_loss=0.1023, over 3899184.78 frames. ], batch size: 75, lr: 2.55e-02, grad_scale: 32768.0 2024-08-09 21:45:56,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6800, loss[loss=0.1193, beats_loss=0.01315, ecapa_loss=0.0003085, whisper_loss=0.1031, over 22574.00 frames. ], tot_loss[loss=0.1184, beats_loss=0.0127, ecapa_loss=0.0003328, whisper_loss=0.1023, over 3892770.49 frames. ], batch size: 87, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:45:58,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.928e+01 3.409e+01 4.100e+01 8.566e+01, threshold=6.819e+01, percent-clipped=2.0 2024-08-09 21:46:03,391 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-09 21:46:11,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=213080.0, ans=0.0 2024-08-09 21:46:13,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=213080.0, ans=0.2 2024-08-09 21:46:22,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=213180.0, ans=0.07 2024-08-09 21:46:35,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-08-09 21:46:36,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=213280.0, ans=0.0 2024-08-09 21:46:36,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.21 vs. limit=22.5 2024-08-09 21:46:43,348 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-09 21:46:52,964 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-09 21:46:57,264 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 21:47:03,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-09 21:47:03,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6850, loss[loss=0.09989, beats_loss=0.01359, ecapa_loss=0.0003471, whisper_loss=0.08283, over 15085.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01274, ecapa_loss=0.0003322, whisper_loss=0.1014, over 3885785.47 frames. ], batch size: 63, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:47:06,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=213480.0, ans=0.125 2024-08-09 21:47:21,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=213580.0, ans=0.0 2024-08-09 21:47:46,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=213780.0, ans=0.125 2024-08-09 21:47:57,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=213880.0, ans=0.05 2024-08-09 21:48:10,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6900, loss[loss=0.1484, beats_loss=0.01123, ecapa_loss=0.0004002, whisper_loss=0.1332, over 22676.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01276, ecapa_loss=0.0003342, whisper_loss=0.1015, over 3895244.01 frames. ], batch size: 92, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:48:13,989 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 3.002e+01 3.455e+01 4.166e+01 7.035e+01, threshold=6.909e+01, percent-clipped=1.0 2024-08-09 21:48:14,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2024-08-09 21:48:17,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=15.0 2024-08-09 21:48:21,022 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-09 21:48:34,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214080.0, ans=0.1 2024-08-09 21:48:35,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2024-08-09 21:48:36,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214080.0, ans=0.1 2024-08-09 21:48:38,522 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-09 21:48:50,718 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-09 21:48:54,510 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-09 21:48:54,885 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-09 21:49:02,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-09 21:49:17,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 6950, loss[loss=0.108, beats_loss=0.01113, ecapa_loss=0.0003174, whisper_loss=0.09367, over 13955.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01284, ecapa_loss=0.0003306, whisper_loss=0.1013, over 3912829.11 frames. ], batch size: 53, lr: 2.54e-02, grad_scale: 32768.0 2024-08-09 21:49:29,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=214480.0, ans=0.125 2024-08-09 21:49:38,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=214580.0, ans=0.125 2024-08-09 21:49:39,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=214580.0, ans=0.025 2024-08-09 21:49:44,977 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 21:49:58,606 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-09 21:50:06,594 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 21:50:08,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=214780.0, ans=0.125 2024-08-09 21:50:09,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214780.0, ans=0.1 2024-08-09 21:50:10,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=214880.0, ans=0.0 2024-08-09 21:50:13,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=214880.0, ans=0.125 2024-08-09 21:50:24,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7000, loss[loss=0.129, beats_loss=0.01133, ecapa_loss=0.0003777, whisper_loss=0.1138, over 21439.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01269, ecapa_loss=0.0003334, whisper_loss=0.1016, over 3873062.17 frames. ], batch size: 89, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:50:27,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.842e+01 3.336e+01 4.058e+01 9.243e+01, threshold=6.672e+01, percent-clipped=2.0 2024-08-09 21:50:31,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=214980.0, ans=0.125 2024-08-09 21:50:41,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=215080.0, ans=0.1 2024-08-09 21:50:41,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=215080.0, ans=0.2 2024-08-09 21:51:05,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=215280.0, ans=0.125 2024-08-09 21:51:10,078 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-09 21:51:10,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=215280.0, ans=0.125 2024-08-09 21:51:26,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=215380.0, ans=0.0 2024-08-09 21:51:33,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7050, loss[loss=0.1525, beats_loss=0.009764, ecapa_loss=0.0003697, whisper_loss=0.139, over 21642.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01257, ecapa_loss=0.0003346, whisper_loss=0.1022, over 3870857.24 frames. ], batch size: 83, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:51:45,881 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 21:51:49,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-09 21:51:54,427 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 21:51:56,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215580.0, ans=0.1 2024-08-09 21:51:57,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=215580.0, ans=0.2 2024-08-09 21:51:59,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-09 21:52:02,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=215680.0, ans=0.125 2024-08-09 21:52:10,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=215680.0, ans=0.125 2024-08-09 21:52:17,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=215780.0, ans=0.0 2024-08-09 21:52:18,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215780.0, ans=0.1 2024-08-09 21:52:22,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=215780.0, ans=0.0 2024-08-09 21:52:41,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7100, loss[loss=0.098, beats_loss=0.01278, ecapa_loss=0.0002468, whisper_loss=0.08276, over 17539.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01266, ecapa_loss=0.0003317, whisper_loss=0.1015, over 3878240.37 frames. ], batch size: 63, lr: 2.53e-02, grad_scale: 32768.0 2024-08-09 21:52:43,989 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.849e+01 3.267e+01 3.796e+01 6.737e+01, threshold=6.534e+01, percent-clipped=1.0 2024-08-09 21:52:44,170 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 21:52:53,912 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 21:53:07,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216180.0, ans=0.1 2024-08-09 21:53:12,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.70 vs. limit=15.0 2024-08-09 21:53:15,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=216180.0, ans=0.0 2024-08-09 21:53:21,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2024-08-09 21:53:38,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=216380.0, ans=0.125 2024-08-09 21:53:42,043 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 21:53:48,260 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7150, loss[loss=0.07787, beats_loss=0.01374, ecapa_loss=0.0003024, whisper_loss=0.06111, over 13721.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01267, ecapa_loss=0.0003295, whisper_loss=0.1013, over 3889650.63 frames. ], batch size: 55, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:53:58,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-08-09 21:53:59,867 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.781e-02 2024-08-09 21:54:00,763 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-09 21:54:05,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=216580.0, ans=0.125 2024-08-09 21:54:15,773 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 21:54:24,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-09 21:54:25,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=216680.0, ans=0.0 2024-08-09 21:54:39,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=216780.0, ans=0.95 2024-08-09 21:54:51,261 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-09 21:54:52,329 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-09 21:54:54,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7200, loss[loss=0.1342, beats_loss=0.01199, ecapa_loss=0.000377, whisper_loss=0.1185, over 23200.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01268, ecapa_loss=0.0003313, whisper_loss=0.1016, over 3886821.72 frames. ], batch size: 93, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:54:57,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+01 3.192e+01 3.694e+01 4.293e+01 6.634e+01, threshold=7.388e+01, percent-clipped=1.0 2024-08-09 21:55:00,171 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-09 21:55:03,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=216980.0, ans=0.125 2024-08-09 21:55:31,059 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-09 21:55:40,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2024-08-09 21:55:45,246 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 21:55:51,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=217380.0, ans=0.0 2024-08-09 21:55:57,308 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 21:56:00,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7250, loss[loss=0.1245, beats_loss=0.01332, ecapa_loss=0.0002505, whisper_loss=0.1087, over 16457.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01266, ecapa_loss=0.000329, whisper_loss=0.1017, over 3895142.91 frames. ], batch size: 64, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:56:07,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217480.0, ans=0.1 2024-08-09 21:56:29,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-08-09 21:56:38,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=217680.0, ans=0.125 2024-08-09 21:56:38,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=217680.0, ans=0.0 2024-08-09 21:56:55,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217880.0, ans=0.1 2024-08-09 21:57:02,626 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 32 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 21:57:03,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-09 21:57:05,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=217880.0, ans=0.025 2024-08-09 21:57:07,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7300, loss[loss=0.1323, beats_loss=0.009905, ecapa_loss=0.0003136, whisper_loss=0.1193, over 17089.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01265, ecapa_loss=0.0003289, whisper_loss=0.1018, over 3875204.19 frames. ], batch size: 65, lr: 2.52e-02, grad_scale: 32768.0 2024-08-09 21:57:10,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 3.021e+01 3.524e+01 4.153e+01 7.749e+01, threshold=7.049e+01, percent-clipped=1.0 2024-08-09 21:57:32,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=218080.0, ans=0.0 2024-08-09 21:57:33,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=218180.0, ans=0.0 2024-08-09 21:57:35,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=218180.0, ans=0.125 2024-08-09 21:57:40,398 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-09 21:57:47,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2024-08-09 21:58:07,239 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-09 21:58:15,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7350, loss[loss=0.1121, beats_loss=0.01357, ecapa_loss=0.0003152, whisper_loss=0.0954, over 21646.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.0126, ecapa_loss=0.0003306, whisper_loss=0.1016, over 3856641.26 frames. ], batch size: 88, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:58:27,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=218580.0, ans=0.1 2024-08-09 21:58:28,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=218580.0, ans=0.125 2024-08-09 21:58:49,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=218680.0, ans=0.2 2024-08-09 21:59:00,461 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-09 21:59:14,152 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-09 21:59:22,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7400, loss[loss=0.1075, beats_loss=0.01497, ecapa_loss=0.0002841, whisper_loss=0.08967, over 20461.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.0127, ecapa_loss=0.0003291, whisper_loss=0.1012, over 3856651.45 frames. ], batch size: 81, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 21:59:24,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.912e+01 3.245e+01 3.982e+01 7.444e+01, threshold=6.489e+01, percent-clipped=1.0 2024-08-09 21:59:27,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=218980.0, ans=0.0 2024-08-09 21:59:29,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=218980.0, ans=0.125 2024-08-09 21:59:39,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2024-08-09 21:59:44,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=219080.0, ans=0.0 2024-08-09 21:59:53,753 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 22:00:27,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7450, loss[loss=0.1264, beats_loss=0.01112, ecapa_loss=0.0003364, whisper_loss=0.1119, over 15705.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01272, ecapa_loss=0.0003309, whisper_loss=0.1012, over 3857257.95 frames. ], batch size: 61, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:00:46,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-09 22:00:55,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=219680.0, ans=0.125 2024-08-09 22:00:55,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=219680.0, ans=0.2 2024-08-09 22:01:11,793 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 22:01:32,141 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7500, loss[loss=0.1269, beats_loss=0.01114, ecapa_loss=0.0003385, whisper_loss=0.1124, over 21770.00 frames. ], tot_loss[loss=0.1179, beats_loss=0.01267, ecapa_loss=0.0003319, whisper_loss=0.1019, over 3857757.84 frames. ], batch size: 84, lr: 2.51e-02, grad_scale: 32768.0 2024-08-09 22:01:34,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.195e+01 3.556e+01 4.126e+01 6.406e+01, threshold=7.112e+01, percent-clipped=0.0 2024-08-09 22:01:45,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-09 22:02:31,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=220380.0, ans=0.125 2024-08-09 22:02:38,674 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7550, loss[loss=0.1065, beats_loss=0.01378, ecapa_loss=0.0003727, whisper_loss=0.08896, over 16368.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01264, ecapa_loss=0.0003327, whisper_loss=0.1018, over 3853226.98 frames. ], batch size: 70, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:02:44,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=220480.0, ans=0.125 2024-08-09 22:02:45,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=220480.0, ans=0.0 2024-08-09 22:02:56,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=220580.0, ans=0.2 2024-08-09 22:03:02,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=220580.0, ans=0.125 2024-08-09 22:03:02,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-08-09 22:03:03,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=220680.0, ans=0.2 2024-08-09 22:03:07,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=220680.0, ans=0.125 2024-08-09 22:03:23,790 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 22:03:26,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=220780.0, ans=10.0 2024-08-09 22:03:29,984 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-09 22:03:38,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-09 22:03:40,405 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 22:03:43,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7600, loss[loss=0.07753, beats_loss=0.01623, ecapa_loss=0.000294, whisper_loss=0.05837, over 14656.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01262, ecapa_loss=0.0003318, whisper_loss=0.1017, over 3828562.93 frames. ], batch size: 62, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:03:46,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.898e+01 3.243e+01 3.786e+01 9.374e+01, threshold=6.487e+01, percent-clipped=2.0 2024-08-09 22:03:46,496 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 22:03:58,521 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-09 22:04:12,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=221180.0, ans=0.125 2024-08-09 22:04:22,525 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-09 22:04:27,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=221280.0, ans=0.0 2024-08-09 22:04:46,699 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-09 22:04:51,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7650, loss[loss=0.1087, beats_loss=0.01144, ecapa_loss=0.0002843, whisper_loss=0.09442, over 24035.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01269, ecapa_loss=0.0003312, whisper_loss=0.1008, over 3849603.43 frames. ], batch size: 94, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:05:08,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=221580.0, ans=0.125 2024-08-09 22:05:19,717 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=6.810e-02 2024-08-09 22:05:21,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=221680.0, ans=0.0 2024-08-09 22:05:33,443 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 22:05:42,929 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-09 22:05:45,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221780.0, ans=0.1 2024-08-09 22:05:57,868 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 25 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-09 22:05:58,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=221880.0, ans=0.2 2024-08-09 22:06:13,015 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7700, loss[loss=0.1231, beats_loss=0.01325, ecapa_loss=0.0003455, whisper_loss=0.1064, over 21200.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01273, ecapa_loss=0.0003318, whisper_loss=0.101, over 3877911.80 frames. ], batch size: 83, lr: 2.50e-02, grad_scale: 65536.0 2024-08-09 22:06:15,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.870e+01 3.289e+01 3.671e+01 6.131e+01, threshold=6.578e+01, percent-clipped=0.0 2024-08-09 22:06:20,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=12.0 2024-08-09 22:06:25,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=8.0 2024-08-09 22:06:40,155 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-09 22:06:44,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=222080.0, ans=0.0 2024-08-09 22:06:46,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=222080.0, ans=0.0 2024-08-09 22:07:26,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=222280.0, ans=0.125 2024-08-09 22:07:26,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.59 vs. limit=10.0 2024-08-09 22:07:38,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=222380.0, ans=0.1 2024-08-09 22:07:41,850 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-09 22:07:59,383 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7750, loss[loss=0.1279, beats_loss=0.01253, ecapa_loss=0.0002879, whisper_loss=0.1125, over 23937.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01276, ecapa_loss=0.000334, whisper_loss=0.1012, over 3880454.29 frames. ], batch size: 90, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:08:25,029 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-09 22:08:26,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=222580.0, ans=0.2 2024-08-09 22:08:38,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2024-08-09 22:08:44,044 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-09 22:08:48,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=222780.0, ans=15.0 2024-08-09 22:08:52,101 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 22:08:59,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=222780.0, ans=0.125 2024-08-09 22:09:07,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=222880.0, ans=0.07 2024-08-09 22:09:12,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=222880.0, ans=0.0 2024-08-09 22:09:14,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=222880.0, ans=0.1 2024-08-09 22:09:16,389 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7800, loss[loss=0.1325, beats_loss=0.01267, ecapa_loss=0.0003085, whisper_loss=0.1168, over 19778.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01278, ecapa_loss=0.000332, whisper_loss=0.1011, over 3868942.21 frames. ], batch size: 78, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:09:19,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.196e+01 3.636e+01 4.618e+01 8.254e+01, threshold=7.273e+01, percent-clipped=2.0 2024-08-09 22:09:25,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=222980.0, ans=0.0 2024-08-09 22:09:31,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=223080.0, ans=0.125 2024-08-09 22:09:42,674 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 22:09:43,935 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-09 22:09:44,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.39 vs. limit=10.0 2024-08-09 22:09:50,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=223180.0, ans=0.125 2024-08-09 22:09:50,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=223180.0, ans=0.125 2024-08-09 22:10:00,807 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-09 22:10:01,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=223280.0, ans=0.1 2024-08-09 22:10:23,305 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-09 22:10:25,145 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-09 22:10:25,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-09 22:10:30,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.31 vs. limit=22.5 2024-08-09 22:10:32,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7850, loss[loss=0.1069, beats_loss=0.01585, ecapa_loss=0.0003808, whisper_loss=0.08728, over 13588.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01281, ecapa_loss=0.0003298, whisper_loss=0.1008, over 3853911.57 frames. ], batch size: 57, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:10:34,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=223480.0, ans=0.125 2024-08-09 22:10:37,553 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-09 22:10:42,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223480.0, ans=0.1 2024-08-09 22:10:56,747 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-09 22:10:58,053 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-09 22:10:58,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-08-09 22:11:31,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=223880.0, ans=0.1 2024-08-09 22:11:31,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=223880.0, ans=0.0 2024-08-09 22:11:35,315 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 22:11:43,008 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-09 22:11:47,068 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7900, loss[loss=0.1258, beats_loss=0.009384, ecapa_loss=0.0003979, whisper_loss=0.1124, over 15702.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01274, ecapa_loss=0.0003294, whisper_loss=0.1013, over 3865810.18 frames. ], batch size: 61, lr: 2.49e-02, grad_scale: 65536.0 2024-08-09 22:11:50,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.925e+01 3.324e+01 4.014e+01 6.320e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-09 22:11:51,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=223980.0, ans=0.0 2024-08-09 22:11:53,377 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-09 22:12:30,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=224180.0, ans=0.07 2024-08-09 22:12:35,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=224280.0, ans=0.125 2024-08-09 22:13:06,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 7950, loss[loss=0.1064, beats_loss=0.01496, ecapa_loss=0.0002689, whisper_loss=0.08879, over 19339.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01271, ecapa_loss=0.0003285, whisper_loss=0.1015, over 3903932.21 frames. ], batch size: 78, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:13:08,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-09 22:13:41,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=224680.0, ans=10.0 2024-08-09 22:13:55,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=224780.0, ans=0.125 2024-08-09 22:14:10,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=224880.0, ans=0.0 2024-08-09 22:14:10,904 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.278e-02 2024-08-09 22:14:19,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=224980.0, ans=0.2 2024-08-09 22:14:20,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8000, loss[loss=0.1264, beats_loss=0.01071, ecapa_loss=0.000379, whisper_loss=0.1119, over 17923.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01271, ecapa_loss=0.0003258, whisper_loss=0.1016, over 3886067.24 frames. ], batch size: 73, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:14:22,407 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-09 22:14:23,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 3.124e+01 3.387e+01 3.961e+01 6.094e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-09 22:14:29,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=224980.0, ans=0.0 2024-08-09 22:14:34,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.86 vs. limit=15.0 2024-08-09 22:14:43,857 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-09 22:14:52,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-09 22:14:55,188 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 22:14:58,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=225180.0, ans=0.125 2024-08-09 22:15:12,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=225280.0, ans=0.125 2024-08-09 22:15:28,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=225380.0, ans=0.0 2024-08-09 22:15:35,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8050, loss[loss=0.1094, beats_loss=0.01531, ecapa_loss=0.000375, whisper_loss=0.09029, over 17261.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01265, ecapa_loss=0.0003243, whisper_loss=0.1023, over 3862804.88 frames. ], batch size: 74, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:15:53,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=225580.0, ans=0.1 2024-08-09 22:16:00,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-09 22:16:03,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=225680.0, ans=0.2 2024-08-09 22:16:03,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=15.0 2024-08-09 22:16:10,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=225680.0, ans=0.125 2024-08-09 22:16:11,395 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 22:16:50,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8100, loss[loss=0.1336, beats_loss=0.01053, ecapa_loss=0.0003399, whisper_loss=0.1197, over 23050.00 frames. ], tot_loss[loss=0.1175, beats_loss=0.01269, ecapa_loss=0.0003244, whisper_loss=0.1016, over 3849264.88 frames. ], batch size: 92, lr: 2.48e-02, grad_scale: 65536.0 2024-08-09 22:16:51,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=225980.0, ans=0.025 2024-08-09 22:16:53,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.949e+01 3.347e+01 3.946e+01 6.724e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-09 22:16:55,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=225980.0, ans=0.05 2024-08-09 22:16:57,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-09 22:17:00,095 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-09 22:17:18,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.13 vs. limit=15.0 2024-08-09 22:17:20,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=226180.0, ans=0.09899494936611666 2024-08-09 22:17:21,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=226180.0, ans=0.0 2024-08-09 22:17:40,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=226280.0, ans=0.0 2024-08-09 22:17:43,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.16 vs. limit=22.5 2024-08-09 22:17:46,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=226280.0, ans=0.125 2024-08-09 22:17:57,211 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-09 22:18:05,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=226480.0, ans=0.05 2024-08-09 22:18:05,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8150, loss[loss=0.108, beats_loss=0.01527, ecapa_loss=0.0003681, whisper_loss=0.08908, over 21060.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01266, ecapa_loss=0.0003247, whisper_loss=0.1011, over 3864119.50 frames. ], batch size: 90, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:18:09,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=226480.0, ans=0.125 2024-08-09 22:18:17,107 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-09 22:18:17,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=226480.0, ans=0.125 2024-08-09 22:18:38,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=226680.0, ans=0.125 2024-08-09 22:18:45,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=226680.0, ans=0.09899494936611666 2024-08-09 22:19:23,130 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8200, loss[loss=0.07846, beats_loss=0.01717, ecapa_loss=0.0003413, whisper_loss=0.05787, over 20060.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01266, ecapa_loss=0.000327, whisper_loss=0.1009, over 3890920.26 frames. ], batch size: 86, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:19:25,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 3.072e+01 3.518e+01 4.235e+01 6.207e+01, threshold=7.036e+01, percent-clipped=0.0 2024-08-09 22:19:28,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=15.0 2024-08-09 22:19:34,139 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-09 22:19:36,991 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.798e-01 2024-08-09 22:19:40,404 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-09 22:19:45,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=227080.0, ans=0.125 2024-08-09 22:19:48,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=227080.0, ans=0.125 2024-08-09 22:19:48,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227080.0, ans=0.1 2024-08-09 22:19:59,452 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-09 22:20:11,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=22.5 2024-08-09 22:20:15,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=227280.0, ans=0.2 2024-08-09 22:20:26,714 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-09 22:20:39,577 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8250, loss[loss=0.1087, beats_loss=0.01255, ecapa_loss=0.000331, whisper_loss=0.09284, over 22513.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01266, ecapa_loss=0.0003282, whisper_loss=0.1006, over 3901333.94 frames. ], batch size: 92, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:21:41,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=227880.0, ans=0.0 2024-08-09 22:21:42,371 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-09 22:21:44,735 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 22:21:54,851 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 22:21:56,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8300, loss[loss=0.1037, beats_loss=0.0135, ecapa_loss=0.0002873, whisper_loss=0.0873, over 19266.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01262, ecapa_loss=0.0003264, whisper_loss=0.1009, over 3916951.13 frames. ], batch size: 75, lr: 2.47e-02, grad_scale: 65536.0 2024-08-09 22:21:58,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=227980.0, ans=0.125 2024-08-09 22:21:59,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.849e+01 3.182e+01 3.709e+01 5.211e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-09 22:22:42,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=228280.0, ans=0.125 2024-08-09 22:22:45,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=228280.0, ans=0.0 2024-08-09 22:22:50,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=228280.0, ans=0.0 2024-08-09 22:22:58,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=22.5 2024-08-09 22:23:10,154 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8350, loss[loss=0.1199, beats_loss=0.0111, ecapa_loss=0.0003335, whisper_loss=0.1055, over 18644.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01257, ecapa_loss=0.0003262, whisper_loss=0.101, over 3908137.34 frames. ], batch size: 72, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:23:10,771 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-09 22:23:16,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=228480.0, ans=0.0 2024-08-09 22:23:38,720 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-09 22:23:40,282 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-09 22:23:52,046 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-09 22:23:52,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2024-08-09 22:24:10,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=228880.0, ans=0.125 2024-08-09 22:24:12,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2024-08-09 22:24:26,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=228980.0, ans=0.125 2024-08-09 22:24:26,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8400, loss[loss=0.103, beats_loss=0.01325, ecapa_loss=0.0003429, whisper_loss=0.08634, over 20068.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.01254, ecapa_loss=0.0003267, whisper_loss=0.1018, over 3919803.24 frames. ], batch size: 81, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:24:29,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.962e+01 3.410e+01 4.213e+01 6.836e+01, threshold=6.819e+01, percent-clipped=3.0 2024-08-09 22:24:44,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=229080.0, ans=0.125 2024-08-09 22:24:47,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=229080.0, ans=15.0 2024-08-09 22:25:01,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.59 vs. limit=22.5 2024-08-09 22:25:18,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-08-09 22:25:21,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=229280.0, ans=0.125 2024-08-09 22:25:23,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=229280.0, ans=0.5 2024-08-09 22:25:40,580 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-09 22:25:42,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8450, loss[loss=0.1244, beats_loss=0.008953, ecapa_loss=0.0003987, whisper_loss=0.1114, over 21950.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01252, ecapa_loss=0.0003286, whisper_loss=0.1014, over 3916631.57 frames. ], batch size: 93, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:25:42,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=229480.0, ans=0.125 2024-08-09 22:25:50,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=229480.0, ans=6.0 2024-08-09 22:26:01,021 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 22:26:06,789 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 22:26:09,558 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 22:26:18,863 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 22:26:20,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229680.0, ans=0.1 2024-08-09 22:26:25,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2024-08-09 22:26:38,201 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-09 22:26:41,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=229780.0, ans=0.04949747468305833 2024-08-09 22:26:46,763 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-09 22:26:58,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8500, loss[loss=0.0888, beats_loss=0.01381, ecapa_loss=0.000276, whisper_loss=0.07223, over 21214.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01257, ecapa_loss=0.000328, whisper_loss=0.1016, over 3947836.28 frames. ], batch size: 85, lr: 2.46e-02, grad_scale: 65536.0 2024-08-09 22:27:00,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.472e+01 3.102e+01 3.448e+01 4.001e+01 5.719e+01, threshold=6.896e+01, percent-clipped=0.0 2024-08-09 22:27:29,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=230180.0, ans=0.1 2024-08-09 22:27:30,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=230180.0, ans=0.0 2024-08-09 22:27:43,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=230280.0, ans=0.0 2024-08-09 22:27:52,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=230280.0, ans=0.0 2024-08-09 22:28:03,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=230380.0, ans=0.1 2024-08-09 22:28:05,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=230380.0, ans=0.125 2024-08-09 22:28:11,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=230480.0, ans=0.0 2024-08-09 22:28:12,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8550, loss[loss=0.136, beats_loss=0.01392, ecapa_loss=0.0003401, whisper_loss=0.1187, over 22886.00 frames. ], tot_loss[loss=0.1177, beats_loss=0.01259, ecapa_loss=0.0003271, whisper_loss=0.1018, over 3952556.22 frames. ], batch size: 91, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:28:19,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=230480.0, ans=0.2 2024-08-09 22:28:36,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=230580.0, ans=0.125 2024-08-09 22:28:41,349 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-09 22:28:45,608 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 31 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-09 22:28:59,232 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-09 22:29:00,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=230780.0, ans=0.125 2024-08-09 22:29:01,794 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-09 22:29:03,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=230780.0, ans=0.02 2024-08-09 22:29:13,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=230880.0, ans=0.1 2024-08-09 22:29:15,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230880.0, ans=0.1 2024-08-09 22:29:15,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2024-08-09 22:29:20,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=230880.0, ans=0.0 2024-08-09 22:29:24,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-09 22:29:26,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8600, loss[loss=0.1249, beats_loss=0.01355, ecapa_loss=0.0003585, whisper_loss=0.1078, over 21646.00 frames. ], tot_loss[loss=0.1182, beats_loss=0.01255, ecapa_loss=0.0003266, whisper_loss=0.1024, over 3927869.97 frames. ], batch size: 90, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:29:28,219 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-09 22:29:29,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.896e+01 3.419e+01 4.251e+01 8.504e+01, threshold=6.839e+01, percent-clipped=1.0 2024-08-09 22:29:33,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=230980.0, ans=0.0 2024-08-09 22:29:34,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=230980.0, ans=0.125 2024-08-09 22:29:46,617 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-09 22:30:18,546 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-09 22:30:23,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=231380.0, ans=0.2 2024-08-09 22:30:33,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231380.0, ans=0.1 2024-08-09 22:30:39,384 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8650, loss[loss=0.09816, beats_loss=0.01271, ecapa_loss=0.000294, whisper_loss=0.0825, over 14237.00 frames. ], tot_loss[loss=0.1178, beats_loss=0.01259, ecapa_loss=0.0003261, whisper_loss=0.102, over 3939851.05 frames. ], batch size: 55, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:30:48,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=231480.0, ans=0.2 2024-08-09 22:30:50,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231480.0, ans=0.1 2024-08-09 22:30:56,356 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.611e-02 2024-08-09 22:31:04,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=231580.0, ans=0.0 2024-08-09 22:31:16,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.77 vs. limit=10.0 2024-08-09 22:31:18,190 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 20 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-09 22:31:51,611 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8700, loss[loss=0.114, beats_loss=0.0126, ecapa_loss=0.0003028, whisper_loss=0.0984, over 18704.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.01264, ecapa_loss=0.0003262, whisper_loss=0.1014, over 3913355.11 frames. ], batch size: 75, lr: 2.45e-02, grad_scale: 65536.0 2024-08-09 22:31:54,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.440e+01 3.097e+01 3.569e+01 4.188e+01 5.734e+01, threshold=7.139e+01, percent-clipped=0.0 2024-08-09 22:32:03,059 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-09 22:32:26,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=232180.0, ans=0.125 2024-08-09 22:32:41,372 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 22:32:45,452 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-09 22:32:49,532 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 38 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 22:32:51,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=232380.0, ans=0.125 2024-08-09 22:32:52,559 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 22:32:59,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-09 22:33:07,474 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8750, loss[loss=0.1231, beats_loss=0.01408, ecapa_loss=0.0003498, whisper_loss=0.1055, over 21733.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01264, ecapa_loss=0.0003265, whisper_loss=0.1011, over 3893210.96 frames. ], batch size: 90, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:34:02,887 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-09 22:34:07,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=15.0 2024-08-09 22:34:19,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8800, loss[loss=0.1127, beats_loss=0.01392, ecapa_loss=0.0003332, whisper_loss=0.09546, over 20615.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.0127, ecapa_loss=0.0003253, whisper_loss=0.1011, over 3883113.52 frames. ], batch size: 84, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:34:22,540 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.701e-03 2024-08-09 22:34:23,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 3.102e+01 3.612e+01 4.206e+01 6.577e+01, threshold=7.224e+01, percent-clipped=0.0 2024-08-09 22:34:31,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=232980.0, ans=0.2 2024-08-09 22:34:42,995 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-09 22:34:49,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=233180.0, ans=0.0 2024-08-09 22:35:20,530 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-09 22:35:34,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2024-08-09 22:35:34,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8850, loss[loss=0.1142, beats_loss=0.01109, ecapa_loss=0.0003215, whisper_loss=0.0999, over 17590.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01261, ecapa_loss=0.0003262, whisper_loss=0.1013, over 3896267.16 frames. ], batch size: 68, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:35:38,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=233480.0, ans=0.95 2024-08-09 22:35:43,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=233480.0, ans=0.125 2024-08-09 22:35:52,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=233580.0, ans=0.035 2024-08-09 22:36:11,883 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-09 22:36:21,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.30 vs. limit=22.5 2024-08-09 22:36:24,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=233780.0, ans=0.125 2024-08-09 22:36:24,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-08-09 22:36:31,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=233880.0, ans=0.0 2024-08-09 22:36:32,842 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 22:36:45,041 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8900, loss[loss=0.09549, beats_loss=0.01524, ecapa_loss=0.0003262, whisper_loss=0.07698, over 15849.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01268, ecapa_loss=0.0003261, whisper_loss=0.1012, over 3883801.45 frames. ], batch size: 68, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:36:45,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.36 vs. limit=22.5 2024-08-09 22:36:47,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.807e+01 3.249e+01 3.699e+01 6.208e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-09 22:36:51,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=233980.0, ans=0.0 2024-08-09 22:36:55,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=233980.0, ans=0.2 2024-08-09 22:36:57,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=234080.0, ans=0.0 2024-08-09 22:37:09,645 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-09 22:37:13,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=234180.0, ans=0.125 2024-08-09 22:37:35,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=15.0 2024-08-09 22:37:37,877 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 22:37:41,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.03 vs. limit=5.0 2024-08-09 22:37:56,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 8950, loss[loss=0.1309, beats_loss=0.01495, ecapa_loss=0.0002915, whisper_loss=0.1131, over 14551.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01265, ecapa_loss=0.0003259, whisper_loss=0.101, over 3870262.19 frames. ], batch size: 57, lr: 2.44e-02, grad_scale: 65536.0 2024-08-09 22:38:04,857 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-09 22:38:48,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=234780.0, ans=0.2 2024-08-09 22:38:59,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2024-08-09 22:39:02,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=234880.0, ans=0.0 2024-08-09 22:39:04,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9000, loss[loss=0.1236, beats_loss=0.01045, ecapa_loss=0.0002802, whisper_loss=0.1104, over 14888.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01264, ecapa_loss=0.0003269, whisper_loss=0.1008, over 3864067.78 frames. ], batch size: 54, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:39:04,953 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 22:39:43,706 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on ASR_libri: loss=0.2806, beats_loss=0, ecapa_loss=0.0009572, whisper_loss=0.2711, over 922467.00 frames. 2024-08-09 22:40:01,267 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on SV_voxceleb1: loss=0.008746, beats_loss=0, ecapa_loss=0.0008746, whisper_loss=0, over 939242.00 frames. 2024-08-09 22:41:51,890 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on AT_audioset: loss=0.02976, beats_loss=0.02976, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-09 22:41:51,898 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-09 22:41:54,361 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 3.054e+01 3.477e+01 3.947e+01 5.844e+01, threshold=6.953e+01, percent-clipped=0.0 2024-08-09 22:41:54,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=234980.0, ans=0.125 2024-08-09 22:42:06,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=235080.0, ans=0.125 2024-08-09 22:42:45,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=235280.0, ans=0.1 2024-08-09 22:42:45,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=235280.0, ans=0.125 2024-08-09 22:42:50,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=235380.0, ans=0.125 2024-08-09 22:42:59,813 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 22:43:04,058 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9050, loss[loss=0.09564, beats_loss=0.01555, ecapa_loss=0.0003429, whisper_loss=0.07666, over 19482.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01264, ecapa_loss=0.0003281, whisper_loss=0.1008, over 3880878.83 frames. ], batch size: 83, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:43:48,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=235780.0, ans=0.0 2024-08-09 22:43:48,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2024-08-09 22:43:56,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=235780.0, ans=0.125 2024-08-09 22:44:04,170 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 22:44:05,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=235880.0, ans=0.125 2024-08-09 22:44:10,060 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 22:44:17,572 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9100, loss[loss=0.1298, beats_loss=0.00915, ecapa_loss=0.0003051, whisper_loss=0.1176, over 22651.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01262, ecapa_loss=0.0003282, whisper_loss=0.1013, over 3893047.00 frames. ], batch size: 87, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:44:18,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=235980.0, ans=0.2 2024-08-09 22:44:20,404 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.942e+01 3.415e+01 3.847e+01 6.703e+01, threshold=6.829e+01, percent-clipped=0.0 2024-08-09 22:44:53,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2024-08-09 22:45:04,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=12.0 2024-08-09 22:45:16,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=236280.0, ans=0.125 2024-08-09 22:45:17,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=236380.0, ans=0.125 2024-08-09 22:45:34,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9150, loss[loss=0.1272, beats_loss=0.009933, ecapa_loss=0.0002991, whisper_loss=0.1143, over 16884.00 frames. ], tot_loss[loss=0.1181, beats_loss=0.01248, ecapa_loss=0.0003244, whisper_loss=0.1024, over 3901163.80 frames. ], batch size: 64, lr: 2.43e-02, grad_scale: 65536.0 2024-08-09 22:45:43,401 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 22:45:49,624 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-09 22:45:56,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236580.0, ans=0.125 2024-08-09 22:45:59,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=236580.0, ans=0.0 2024-08-09 22:45:59,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=236580.0, ans=0.125 2024-08-09 22:46:07,059 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-09 22:46:13,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-08-09 22:46:33,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=236880.0, ans=0.125 2024-08-09 22:46:41,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=236880.0, ans=10.0 2024-08-09 22:46:48,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9200, loss[loss=0.1099, beats_loss=0.01136, ecapa_loss=0.0002825, whisper_loss=0.09573, over 15345.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01253, ecapa_loss=0.000325, whisper_loss=0.1016, over 3882191.35 frames. ], batch size: 57, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:46:49,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=12.0 2024-08-09 22:46:50,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=236980.0, ans=0.125 2024-08-09 22:46:51,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=22.5 2024-08-09 22:46:51,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 2.835e+01 3.303e+01 3.887e+01 6.132e+01, threshold=6.605e+01, percent-clipped=0.0 2024-08-09 22:46:52,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=236980.0, ans=0.125 2024-08-09 22:47:00,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=236980.0, ans=0.125 2024-08-09 22:47:08,191 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-09 22:47:10,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=237080.0, ans=0.125 2024-08-09 22:47:11,340 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-09 22:47:39,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=237280.0, ans=0.125 2024-08-09 22:48:04,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9250, loss[loss=0.1009, beats_loss=0.01503, ecapa_loss=0.0003509, whisper_loss=0.08232, over 21334.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01263, ecapa_loss=0.0003256, whisper_loss=0.1006, over 3883045.67 frames. ], batch size: 91, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:48:15,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=237480.0, ans=0.125 2024-08-09 22:48:21,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=237580.0, ans=0.0 2024-08-09 22:48:30,930 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-09 22:48:39,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237680.0, ans=0.1 2024-08-09 22:48:51,432 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 22:49:02,622 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-09 22:49:08,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.10 vs. limit=22.5 2024-08-09 22:49:08,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.38 vs. limit=22.5 2024-08-09 22:49:24,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9300, loss[loss=0.09994, beats_loss=0.01153, ecapa_loss=0.0003107, whisper_loss=0.08531, over 15210.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.0126, ecapa_loss=0.0003233, whisper_loss=0.1007, over 3892740.12 frames. ], batch size: 59, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:49:27,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+01 3.042e+01 3.380e+01 4.213e+01 8.159e+01, threshold=6.761e+01, percent-clipped=3.0 2024-08-09 22:49:35,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.00 vs. limit=10.0 2024-08-09 22:49:59,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=238180.0, ans=0.0 2024-08-09 22:50:06,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=238180.0, ans=10.0 2024-08-09 22:50:26,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=238380.0, ans=0.125 2024-08-09 22:50:41,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9350, loss[loss=0.1238, beats_loss=0.011, ecapa_loss=0.0003272, whisper_loss=0.1095, over 19929.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01257, ecapa_loss=0.0003227, whisper_loss=0.1013, over 3856185.34 frames. ], batch size: 78, lr: 2.42e-02, grad_scale: 65536.0 2024-08-09 22:50:48,789 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-09 22:51:49,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=238680.0, ans=10.0 2024-08-09 22:52:12,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=238780.0, ans=0.2 2024-08-09 22:52:22,465 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-09 22:52:34,432 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9400, loss[loss=0.122, beats_loss=0.01437, ecapa_loss=0.0003187, whisper_loss=0.1044, over 17924.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01257, ecapa_loss=0.0003238, whisper_loss=0.1013, over 3854824.18 frames. ], batch size: 71, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:52:36,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-08-09 22:52:37,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 2.975e+01 3.274e+01 3.809e+01 6.351e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-09 22:52:37,978 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-09 22:53:01,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=239080.0, ans=0.125 2024-08-09 22:53:17,527 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-09 22:53:20,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=239180.0, ans=0.125 2024-08-09 22:53:21,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-08-09 22:53:35,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239280.0, ans=0.1 2024-08-09 22:53:37,760 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 22:53:52,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=239380.0, ans=0.125 2024-08-09 22:53:52,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=239380.0, ans=0.1 2024-08-09 22:53:54,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=239380.0, ans=0.0 2024-08-09 22:54:05,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=239480.0, ans=0.125 2024-08-09 22:54:06,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9450, loss[loss=0.1069, beats_loss=0.01538, ecapa_loss=0.0002993, whisper_loss=0.08848, over 22113.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01272, ecapa_loss=0.0003227, whisper_loss=0.1003, over 3882348.45 frames. ], batch size: 93, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:54:25,671 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 22:54:32,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=239580.0, ans=0.125 2024-08-09 22:54:40,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=239580.0, ans=0.125 2024-08-09 22:54:46,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=239580.0, ans=0.0 2024-08-09 22:54:59,532 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 22:55:01,721 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 22:55:15,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=239780.0, ans=0.0 2024-08-09 22:55:39,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=239880.0, ans=0.125 2024-08-09 22:55:47,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-09 22:55:51,683 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9500, loss[loss=0.08081, beats_loss=0.01069, ecapa_loss=0.0003738, whisper_loss=0.06638, over 13638.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01265, ecapa_loss=0.0003234, whisper_loss=0.09996, over 3867185.94 frames. ], batch size: 55, lr: 2.41e-02, grad_scale: 65536.0 2024-08-09 22:55:51,848 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-09 22:55:54,338 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-24000.pt 2024-08-09 22:55:59,371 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.955e+01 3.513e+01 3.972e+01 7.065e+01, threshold=7.026e+01, percent-clipped=1.0 2024-08-09 22:56:34,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=240080.0, ans=0.2 2024-08-09 22:56:36,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2024-08-09 22:56:41,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=240180.0, ans=0.0 2024-08-09 22:56:46,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=240180.0, ans=0.1 2024-08-09 22:56:48,778 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-09 22:56:49,923 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 11 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-09 22:56:57,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2024-08-09 22:57:04,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=240280.0, ans=0.125 2024-08-09 22:57:17,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240280.0, ans=0.1 2024-08-09 22:57:19,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=240280.0, ans=0.0 2024-08-09 22:57:40,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=240380.0, ans=0.125 2024-08-09 22:57:50,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9550, loss[loss=0.08941, beats_loss=0.01427, ecapa_loss=0.0003365, whisper_loss=0.07177, over 17053.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.0127, ecapa_loss=0.0003247, whisper_loss=0.09942, over 3855123.97 frames. ], batch size: 71, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:58:12,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=240580.0, ans=0.125 2024-08-09 22:58:40,597 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-09 22:58:56,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=240680.0, ans=0.125 2024-08-09 22:59:12,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=240780.0, ans=0.125 2024-08-09 22:59:40,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=240880.0, ans=0.0 2024-08-09 22:59:46,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9600, loss[loss=0.096, beats_loss=0.01161, ecapa_loss=0.0003061, whisper_loss=0.08133, over 14632.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01267, ecapa_loss=0.0003219, whisper_loss=0.1004, over 3847088.56 frames. ], batch size: 56, lr: 2.41e-02, grad_scale: 131072.0 2024-08-09 22:59:49,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.841e+01 3.249e+01 3.780e+01 5.366e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-09 23:00:18,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241080.0, ans=0.125 2024-08-09 23:00:24,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=241080.0, ans=0.125 2024-08-09 23:00:54,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241280.0, ans=0.125 2024-08-09 23:00:55,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2024-08-09 23:01:12,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=241280.0, ans=0.125 2024-08-09 23:01:21,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=241380.0, ans=0.125 2024-08-09 23:01:33,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9650, loss[loss=0.1268, beats_loss=0.009902, ecapa_loss=0.0004065, whisper_loss=0.1129, over 20736.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01268, ecapa_loss=0.0003232, whisper_loss=0.1006, over 3880740.49 frames. ], batch size: 85, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:01:43,865 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 29 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-09 23:01:45,466 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-09 23:01:47,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.61 vs. limit=10.0 2024-08-09 23:01:53,374 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-09 23:01:54,689 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-09 23:02:17,471 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-09 23:02:24,156 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-09 23:02:35,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=241780.0, ans=0.0 2024-08-09 23:02:51,876 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 23:02:58,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9700, loss[loss=0.1148, beats_loss=0.01173, ecapa_loss=0.0003501, whisper_loss=0.09959, over 18626.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01264, ecapa_loss=0.0003253, whisper_loss=0.1011, over 3863647.89 frames. ], batch size: 76, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:02:58,448 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-09 23:03:01,520 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 3.064e+01 3.484e+01 4.019e+01 6.587e+01, threshold=6.968e+01, percent-clipped=2.0 2024-08-09 23:03:15,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-08-09 23:03:33,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2024-08-09 23:04:01,286 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 23:04:05,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-08-09 23:04:18,345 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-09 23:04:21,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9750, loss[loss=0.1047, beats_loss=0.012, ecapa_loss=0.0003196, whisper_loss=0.08949, over 13895.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01263, ecapa_loss=0.0003232, whisper_loss=0.1009, over 3842945.07 frames. ], batch size: 56, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:04:26,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=242480.0, ans=0.0 2024-08-09 23:04:28,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2024-08-09 23:05:02,013 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 23:05:23,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.47 vs. limit=22.5 2024-08-09 23:05:28,882 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 24 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-09 23:05:29,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=242880.0, ans=0.0 2024-08-09 23:05:37,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=242880.0, ans=0.2 2024-08-09 23:05:41,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9800, loss[loss=0.1081, beats_loss=0.009506, ecapa_loss=0.0003957, whisper_loss=0.09466, over 17409.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01261, ecapa_loss=0.0003219, whisper_loss=0.1011, over 3830635.17 frames. ], batch size: 71, lr: 2.40e-02, grad_scale: 131072.0 2024-08-09 23:05:44,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.875e+01 3.358e+01 3.972e+01 6.084e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-09 23:05:44,519 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-09 23:06:06,961 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-09 23:06:07,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=243080.0, ans=0.125 2024-08-09 23:06:09,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=243080.0, ans=0.0 2024-08-09 23:06:12,071 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-09 23:06:23,895 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-09 23:06:24,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=243180.0, ans=0.0 2024-08-09 23:06:43,347 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-09 23:06:45,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243280.0, ans=0.1 2024-08-09 23:07:04,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=243480.0, ans=0.125 2024-08-09 23:07:05,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9850, loss[loss=0.1058, beats_loss=0.01246, ecapa_loss=0.0002754, whisper_loss=0.0906, over 14524.00 frames. ], tot_loss[loss=0.1176, beats_loss=0.0125, ecapa_loss=0.000323, whisper_loss=0.1019, over 3829510.37 frames. ], batch size: 54, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:07:14,520 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-09 23:07:34,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=243580.0, ans=0.125 2024-08-09 23:07:47,934 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 23:07:54,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=243680.0, ans=0.1 2024-08-09 23:08:07,172 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-09 23:08:11,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=12.0 2024-08-09 23:08:13,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=15.0 2024-08-09 23:08:16,307 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:08:24,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=243880.0, ans=0.0 2024-08-09 23:08:32,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=243980.0, ans=0.2 2024-08-09 23:08:33,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9900, loss[loss=0.138, beats_loss=0.0103, ecapa_loss=0.000416, whisper_loss=0.1235, over 21588.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01259, ecapa_loss=0.0003232, whisper_loss=0.1008, over 3833583.22 frames. ], batch size: 91, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:08:34,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=243980.0, ans=0.95 2024-08-09 23:08:36,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 3.025e+01 3.445e+01 3.906e+01 6.336e+01, threshold=6.890e+01, percent-clipped=0.0 2024-08-09 23:08:38,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=243980.0, ans=0.125 2024-08-09 23:08:47,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2024-08-09 23:09:08,089 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 23:09:14,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-08-09 23:09:19,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=15.0 2024-08-09 23:09:25,401 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.26 vs. limit=22.5 2024-08-09 23:09:55,061 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 9950, loss[loss=0.08504, beats_loss=0.0163, ecapa_loss=0.0002832, whisper_loss=0.06591, over 14448.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01256, ecapa_loss=0.0003249, whisper_loss=0.1005, over 3830620.38 frames. ], batch size: 61, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:10:00,853 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-09 23:10:01,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-08-09 23:10:45,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=244780.0, ans=0.1 2024-08-09 23:10:50,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=244780.0, ans=0.0 2024-08-09 23:11:13,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2024-08-09 23:11:18,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10000, loss[loss=0.1314, beats_loss=0.01163, ecapa_loss=0.0002497, whisper_loss=0.1172, over 20788.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01259, ecapa_loss=0.0003232, whisper_loss=0.1002, over 3837526.18 frames. ], batch size: 79, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:11:22,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.876e+01 3.207e+01 3.745e+01 5.513e+01, threshold=6.413e+01, percent-clipped=0.0 2024-08-09 23:11:26,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=244980.0, ans=0.125 2024-08-09 23:11:37,350 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-09 23:12:04,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=12.0 2024-08-09 23:12:08,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=245180.0, ans=0.0 2024-08-09 23:12:09,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=245180.0, ans=0.125 2024-08-09 23:12:50,735 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10050, loss[loss=0.1302, beats_loss=0.01157, ecapa_loss=0.0002669, whisper_loss=0.116, over 17797.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01268, ecapa_loss=0.0003209, whisper_loss=0.0996, over 3843228.26 frames. ], batch size: 64, lr: 2.39e-02, grad_scale: 131072.0 2024-08-09 23:12:51,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.24 vs. limit=22.5 2024-08-09 23:13:33,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=245680.0, ans=0.125 2024-08-09 23:13:36,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245680.0, ans=0.125 2024-08-09 23:13:42,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245680.0, ans=0.1 2024-08-09 23:13:53,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=245780.0, ans=0.0 2024-08-09 23:13:57,714 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-09 23:13:59,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245780.0, ans=0.125 2024-08-09 23:13:59,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245780.0, ans=0.1 2024-08-09 23:14:06,724 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-09 23:14:08,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=245880.0, ans=0.0 2024-08-09 23:14:11,181 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-09 23:14:14,870 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.812e-02 2024-08-09 23:14:20,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=245880.0, ans=0.125 2024-08-09 23:14:22,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=245980.0, ans=0.125 2024-08-09 23:14:24,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10100, loss[loss=0.1247, beats_loss=0.01189, ecapa_loss=0.000329, whisper_loss=0.1095, over 23100.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01264, ecapa_loss=0.0003223, whisper_loss=0.1005, over 3887262.23 frames. ], batch size: 91, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:14:28,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.998e+01 3.344e+01 3.820e+01 6.746e+01, threshold=6.687e+01, percent-clipped=3.0 2024-08-09 23:14:51,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2024-08-09 23:14:59,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=246180.0, ans=0.0 2024-08-09 23:15:06,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=246180.0, ans=0.2 2024-08-09 23:15:08,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246180.0, ans=0.1 2024-08-09 23:15:13,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=246280.0, ans=0.125 2024-08-09 23:15:15,259 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 23:15:37,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=246380.0, ans=0.125 2024-08-09 23:15:38,740 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-09 23:15:43,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10150, loss[loss=0.1404, beats_loss=0.008664, ecapa_loss=0.0003608, whisper_loss=0.1281, over 15711.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.0126, ecapa_loss=0.0003242, whisper_loss=0.1009, over 3897305.34 frames. ], batch size: 61, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:15:59,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=246580.0, ans=0.125 2024-08-09 23:16:00,390 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-09 23:16:02,553 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 23:16:13,863 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-09 23:16:27,066 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-09 23:16:33,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.32 vs. limit=10.0 2024-08-09 23:16:34,438 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-09 23:16:51,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246880.0, ans=0.1 2024-08-09 23:16:57,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10200, loss[loss=0.1061, beats_loss=0.01312, ecapa_loss=0.0003524, whisper_loss=0.08942, over 21431.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01271, ecapa_loss=0.0003232, whisper_loss=0.1003, over 3873500.67 frames. ], batch size: 86, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:17:00,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.913e+01 3.327e+01 3.843e+01 5.703e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-09 23:17:04,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=15.0 2024-08-09 23:17:04,961 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 12 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-09 23:17:06,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=246980.0, ans=0.0 2024-08-09 23:17:13,826 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-09 23:17:18,886 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-09 23:17:20,155 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-09 23:17:46,846 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-09 23:17:47,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=247280.0, ans=0.2 2024-08-09 23:17:48,434 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-09 23:17:57,587 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 11 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 23:18:04,002 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 15 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-09 23:18:07,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=247380.0, ans=0.0 2024-08-09 23:18:08,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=247480.0, ans=0.0 2024-08-09 23:18:09,934 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10250, loss[loss=0.1476, beats_loss=0.009913, ecapa_loss=0.0003003, whisper_loss=0.1347, over 23184.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01259, ecapa_loss=0.0003221, whisper_loss=0.1011, over 3874070.46 frames. ], batch size: 88, lr: 2.38e-02, grad_scale: 131072.0 2024-08-09 23:18:18,951 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-09 23:18:21,037 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-09 23:18:22,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=247480.0, ans=0.125 2024-08-09 23:18:40,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-09 23:18:45,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=247680.0, ans=0.2 2024-08-09 23:18:48,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=247680.0, ans=0.125 2024-08-09 23:19:01,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=247780.0, ans=0.0 2024-08-09 23:19:15,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=247880.0, ans=0.125 2024-08-09 23:19:17,981 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-09 23:19:21,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10300, loss[loss=0.1112, beats_loss=0.01122, ecapa_loss=0.0003588, whisper_loss=0.09638, over 19709.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01258, ecapa_loss=0.0003224, whisper_loss=0.1005, over 3879951.23 frames. ], batch size: 81, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:19:25,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 3.179e+01 3.546e+01 4.118e+01 7.373e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-09 23:19:25,350 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-09 23:19:25,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-09 23:19:38,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=12.0 2024-08-09 23:19:39,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=248080.0, ans=0.125 2024-08-09 23:19:43,637 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-09 23:19:54,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=29.42 vs. limit=22.5 2024-08-09 23:20:01,311 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.280e-03 2024-08-09 23:20:05,187 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-09 23:20:05,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=248280.0, ans=0.125 2024-08-09 23:20:06,627 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 23:20:14,669 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 23:20:25,863 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-09 23:20:31,417 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-09 23:20:34,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10350, loss[loss=0.1315, beats_loss=0.0113, ecapa_loss=0.0002859, whisper_loss=0.1174, over 22784.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0126, ecapa_loss=0.0003204, whisper_loss=0.1007, over 3879531.98 frames. ], batch size: 87, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:20:37,306 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-09 23:20:37,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=12.0 2024-08-09 23:20:43,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=248480.0, ans=0.125 2024-08-09 23:20:59,677 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-09 23:21:07,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=248680.0, ans=0.2 2024-08-09 23:21:11,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=248680.0, ans=0.2 2024-08-09 23:21:11,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=12.0 2024-08-09 23:21:23,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=248780.0, ans=0.125 2024-08-09 23:21:26,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.44 vs. limit=22.5 2024-08-09 23:21:29,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=248780.0, ans=0.125 2024-08-09 23:21:46,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10400, loss[loss=0.1376, beats_loss=0.008554, ecapa_loss=0.0003898, whisper_loss=0.1252, over 14719.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.0126, ecapa_loss=0.0003191, whisper_loss=0.1, over 3874129.42 frames. ], batch size: 60, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:21:48,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 2.757e+01 3.226e+01 3.794e+01 6.112e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-09 23:21:55,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=248980.0, ans=0.125 2024-08-09 23:21:56,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248980.0, ans=0.1 2024-08-09 23:21:57,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=248980.0, ans=0.125 2024-08-09 23:22:01,471 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-09 23:22:07,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=249080.0, ans=0.0 2024-08-09 23:22:21,567 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-09 23:22:24,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.04 vs. limit=12.0 2024-08-09 23:22:29,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=249280.0, ans=0.04949747468305833 2024-08-09 23:22:48,570 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.734e+03 2024-08-09 23:22:49,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2024-08-09 23:22:50,743 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-09 23:22:54,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10450, loss[loss=0.1066, beats_loss=0.01446, ecapa_loss=0.000236, whisper_loss=0.08975, over 15534.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01254, ecapa_loss=0.0003178, whisper_loss=0.1001, over 3845627.54 frames. ], batch size: 59, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:23:00,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=249480.0, ans=0.0 2024-08-09 23:23:00,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=249480.0, ans=0.125 2024-08-09 23:23:14,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=249580.0, ans=0.0 2024-08-09 23:23:19,613 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-09 23:23:24,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=22.5 2024-08-09 23:23:39,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.60 vs. limit=10.0 2024-08-09 23:23:44,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2024-08-09 23:23:45,315 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 23:23:57,266 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-09 23:23:58,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=249880.0, ans=0.0 2024-08-09 23:24:02,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10500, loss[loss=0.1172, beats_loss=0.01656, ecapa_loss=0.0003003, whisper_loss=0.09767, over 17491.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01259, ecapa_loss=0.000319, whisper_loss=0.09929, over 3844341.47 frames. ], batch size: 72, lr: 2.37e-02, grad_scale: 131072.0 2024-08-09 23:24:05,301 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.948e+01 3.458e+01 4.084e+01 6.883e+01, threshold=6.915e+01, percent-clipped=1.0 2024-08-09 23:24:09,909 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-09 23:24:11,159 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 23:24:11,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2024-08-09 23:25:13,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10550, loss[loss=0.1119, beats_loss=0.01397, ecapa_loss=0.0003075, whisper_loss=0.09489, over 23470.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01261, ecapa_loss=0.0003198, whisper_loss=0.09943, over 3868310.94 frames. ], batch size: 94, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:25:31,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.24 vs. limit=15.0 2024-08-09 23:26:02,038 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-09 23:26:05,917 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-09 23:26:12,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2024-08-09 23:26:17,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250880.0, ans=0.1 2024-08-09 23:26:20,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2024-08-09 23:26:22,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10600, loss[loss=0.1221, beats_loss=0.01168, ecapa_loss=0.0003077, whisper_loss=0.1073, over 22619.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01266, ecapa_loss=0.000322, whisper_loss=0.1003, over 3901354.46 frames. ], batch size: 87, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:26:24,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=250980.0, ans=0.02 2024-08-09 23:26:25,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 3.120e+01 3.519e+01 3.971e+01 7.530e+01, threshold=7.037e+01, percent-clipped=1.0 2024-08-09 23:26:28,377 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-09 23:26:31,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=250980.0, ans=0.2 2024-08-09 23:26:41,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-09 23:26:42,885 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-09 23:27:21,985 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-09 23:27:22,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2024-08-09 23:27:32,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10650, loss[loss=0.1225, beats_loss=0.01293, ecapa_loss=0.0003455, whisper_loss=0.1061, over 23013.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01274, ecapa_loss=0.0003189, whisper_loss=0.1001, over 3886447.36 frames. ], batch size: 91, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:27:35,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-09 23:27:36,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=251480.0, ans=0.125 2024-08-09 23:27:41,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2024-08-09 23:27:44,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251580.0, ans=0.1 2024-08-09 23:28:05,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=251680.0, ans=0.125 2024-08-09 23:28:17,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=251780.0, ans=0.0 2024-08-09 23:28:21,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=251780.0, ans=0.125 2024-08-09 23:28:23,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251780.0, ans=0.1 2024-08-09 23:28:33,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.35 vs. limit=6.0 2024-08-09 23:28:41,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10700, loss[loss=0.1261, beats_loss=0.01154, ecapa_loss=0.0003447, whisper_loss=0.1111, over 19263.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01269, ecapa_loss=0.0003188, whisper_loss=0.1007, over 3904259.31 frames. ], batch size: 77, lr: 2.36e-02, grad_scale: 131072.0 2024-08-09 23:28:44,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.278e+01 2.878e+01 3.295e+01 3.921e+01 5.869e+01, threshold=6.590e+01, percent-clipped=0.0 2024-08-09 23:28:47,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2024-08-09 23:28:53,849 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-09 23:29:00,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-09 23:29:10,082 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 30 from LS+wenet, 9 from Vox, 18 fro AS 2024-08-09 23:29:14,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=252180.0, ans=0.0 2024-08-09 23:29:32,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=252280.0, ans=0.0 2024-08-09 23:29:34,989 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-09 23:29:36,321 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-09 23:29:51,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10750, loss[loss=0.1127, beats_loss=0.01222, ecapa_loss=0.0003508, whisper_loss=0.09694, over 21666.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01257, ecapa_loss=0.0003209, whisper_loss=0.1013, over 3902800.90 frames. ], batch size: 91, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:29:51,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=252480.0, ans=0.1 2024-08-09 23:29:55,520 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-09 23:29:58,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.94 vs. limit=22.5 2024-08-09 23:30:03,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=252480.0, ans=0.125 2024-08-09 23:30:07,087 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-09 23:30:17,123 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 23:30:42,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=252780.0, ans=0.125 2024-08-09 23:30:44,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=252780.0, ans=0.0 2024-08-09 23:30:51,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.02 vs. limit=15.0 2024-08-09 23:30:59,317 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 23 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-09 23:31:00,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10800, loss[loss=0.1006, beats_loss=0.01511, ecapa_loss=0.0003307, whisper_loss=0.08214, over 22571.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01256, ecapa_loss=0.0003189, whisper_loss=0.1013, over 3892959.37 frames. ], batch size: 96, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:31:02,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=252980.0, ans=0.125 2024-08-09 23:31:03,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 3.032e+01 3.349e+01 3.769e+01 6.080e+01, threshold=6.698e+01, percent-clipped=0.0 2024-08-09 23:31:06,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=252980.0, ans=0.0 2024-08-09 23:31:16,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-09 23:31:26,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-09 23:31:28,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2024-08-09 23:31:29,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-09 23:31:32,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-08-09 23:31:42,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=253280.0, ans=0.0 2024-08-09 23:31:57,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=253380.0, ans=0.125 2024-08-09 23:31:58,391 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-09 23:32:03,965 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:32:07,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10850, loss[loss=0.1046, beats_loss=0.01216, ecapa_loss=0.0003397, whisper_loss=0.08907, over 19978.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01261, ecapa_loss=0.0003166, whisper_loss=0.1015, over 3910074.66 frames. ], batch size: 80, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:32:08,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=253480.0, ans=0.1 2024-08-09 23:32:14,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=253480.0, ans=0.2 2024-08-09 23:32:19,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-09 23:32:27,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=253580.0, ans=0.125 2024-08-09 23:32:54,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=253780.0, ans=0.125 2024-08-09 23:32:58,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2024-08-09 23:33:00,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=253880.0, ans=0.125 2024-08-09 23:33:12,536 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-09 23:33:15,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10900, loss[loss=0.1057, beats_loss=0.01453, ecapa_loss=0.0003586, whisper_loss=0.08761, over 21068.00 frames. ], tot_loss[loss=0.1172, beats_loss=0.01257, ecapa_loss=0.0003177, whisper_loss=0.1015, over 3940099.23 frames. ], batch size: 91, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:33:18,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+01 2.959e+01 3.403e+01 3.969e+01 5.664e+01, threshold=6.807e+01, percent-clipped=0.0 2024-08-09 23:33:24,216 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-09 23:33:32,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=254080.0, ans=0.0 2024-08-09 23:33:35,956 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-09 23:33:44,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=254180.0, ans=0.125 2024-08-09 23:33:49,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=254180.0, ans=0.0 2024-08-09 23:33:58,828 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-09 23:34:12,181 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 14 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-09 23:34:13,789 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-09 23:34:15,036 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-09 23:34:22,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 10950, loss[loss=0.1247, beats_loss=0.01289, ecapa_loss=0.0003356, whisper_loss=0.1085, over 19340.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01262, ecapa_loss=0.0003198, whisper_loss=0.1007, over 3925727.30 frames. ], batch size: 82, lr: 2.35e-02, grad_scale: 131072.0 2024-08-09 23:34:40,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-09 23:34:47,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=254580.0, ans=0.125 2024-08-09 23:34:55,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254680.0, ans=0.1 2024-08-09 23:34:55,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=254680.0, ans=0.125 2024-08-09 23:35:06,715 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-09 23:35:09,513 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-09 23:35:16,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=15.0 2024-08-09 23:35:22,774 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-09 23:35:28,612 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-09 23:35:30,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11000, loss[loss=0.117, beats_loss=0.01414, ecapa_loss=0.0003334, whisper_loss=0.09951, over 21934.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01273, ecapa_loss=0.0003219, whisper_loss=0.09932, over 3961997.13 frames. ], batch size: 93, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:35:33,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.844e+01 3.291e+01 3.745e+01 5.513e+01, threshold=6.582e+01, percent-clipped=0.0 2024-08-09 23:35:59,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=255180.0, ans=0.125 2024-08-09 23:36:02,274 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 23:36:12,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=255280.0, ans=0.125 2024-08-09 23:36:20,447 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-09 23:36:33,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=255380.0, ans=0.1 2024-08-09 23:36:37,460 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-09 23:36:41,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11050, loss[loss=0.1175, beats_loss=0.01338, ecapa_loss=0.0002537, whisper_loss=0.1015, over 19398.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.0126, ecapa_loss=0.000322, whisper_loss=0.09972, over 3939119.09 frames. ], batch size: 74, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:36:52,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=255480.0, ans=0.0 2024-08-09 23:36:55,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=255580.0, ans=0.0 2024-08-09 23:36:58,247 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-09 23:37:06,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=255580.0, ans=0.125 2024-08-09 23:37:12,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=12.0 2024-08-09 23:37:23,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.73 vs. limit=22.5 2024-08-09 23:37:42,400 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-09 23:37:50,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11100, loss[loss=0.1036, beats_loss=0.01212, ecapa_loss=0.000304, whisper_loss=0.08841, over 14340.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01253, ecapa_loss=0.0003196, whisper_loss=0.1, over 3926411.94 frames. ], batch size: 55, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:37:53,141 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+01 3.083e+01 3.527e+01 4.357e+01 6.576e+01, threshold=7.054e+01, percent-clipped=0.0 2024-08-09 23:38:13,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=256080.0, ans=0.125 2024-08-09 23:38:15,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.65 vs. limit=10.0 2024-08-09 23:38:18,770 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-09 23:38:20,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=256180.0, ans=0.125 2024-08-09 23:38:28,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=256180.0, ans=0.0 2024-08-09 23:38:41,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=256280.0, ans=0.125 2024-08-09 23:38:50,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=256380.0, ans=0.125 2024-08-09 23:38:59,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11150, loss[loss=0.09212, beats_loss=0.01326, ecapa_loss=0.0003375, whisper_loss=0.07549, over 18351.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0125, ecapa_loss=0.0003195, whisper_loss=0.1008, over 3909881.06 frames. ], batch size: 71, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:39:02,475 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-09 23:39:16,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.66 vs. limit=22.5 2024-08-09 23:39:17,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=256580.0, ans=0.125 2024-08-09 23:39:23,718 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-09 23:39:26,459 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-09 23:39:29,174 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 23:39:31,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256680.0, ans=0.1 2024-08-09 23:39:39,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.78 vs. limit=22.5 2024-08-09 23:39:43,264 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-09 23:39:46,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=256780.0, ans=0.125 2024-08-09 23:40:01,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=256880.0, ans=0.125 2024-08-09 23:40:04,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=256880.0, ans=0.035 2024-08-09 23:40:09,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11200, loss[loss=0.1093, beats_loss=0.01198, ecapa_loss=0.0003066, whisper_loss=0.09428, over 17686.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01239, ecapa_loss=0.000322, whisper_loss=0.1013, over 3899414.62 frames. ], batch size: 69, lr: 2.34e-02, grad_scale: 131072.0 2024-08-09 23:40:12,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 3.109e+01 3.535e+01 4.149e+01 6.453e+01, threshold=7.070e+01, percent-clipped=0.0 2024-08-09 23:40:19,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.73 vs. limit=22.5 2024-08-09 23:40:36,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=257180.0, ans=0.125 2024-08-09 23:40:39,323 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-09 23:40:58,838 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-09 23:41:02,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=257280.0, ans=0.0 2024-08-09 23:41:12,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-08-09 23:41:13,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=257380.0, ans=0.2 2024-08-09 23:41:19,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11250, loss[loss=0.1048, beats_loss=0.01428, ecapa_loss=0.0002739, whisper_loss=0.08779, over 18523.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01245, ecapa_loss=0.0003208, whisper_loss=0.1011, over 3885294.11 frames. ], batch size: 74, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:41:33,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.41 vs. limit=22.5 2024-08-09 23:41:36,666 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-09 23:41:42,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=257580.0, ans=0.0 2024-08-09 23:41:49,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=257680.0, ans=0.0 2024-08-09 23:41:53,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=257680.0, ans=0.0 2024-08-09 23:41:59,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257680.0, ans=0.1 2024-08-09 23:41:59,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=257680.0, ans=0.0 2024-08-09 23:42:05,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=257780.0, ans=0.0 2024-08-09 23:42:06,709 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-09 23:42:10,792 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 10 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-09 23:42:22,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2024-08-09 23:42:23,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=257880.0, ans=0.2 2024-08-09 23:42:24,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=257880.0, ans=0.2 2024-08-09 23:42:28,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11300, loss[loss=0.09976, beats_loss=0.01668, ecapa_loss=0.0002341, whisper_loss=0.08074, over 20350.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.0125, ecapa_loss=0.00032, whisper_loss=0.1001, over 3871733.79 frames. ], batch size: 81, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:42:31,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+01 3.110e+01 3.449e+01 4.025e+01 6.550e+01, threshold=6.899e+01, percent-clipped=0.0 2024-08-09 23:42:34,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=257980.0, ans=0.035 2024-08-09 23:42:34,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=12.0 2024-08-09 23:42:52,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2024-08-09 23:43:18,985 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 25 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-09 23:43:19,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=258280.0, ans=0.2 2024-08-09 23:43:28,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=258380.0, ans=0.05 2024-08-09 23:43:36,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11350, loss[loss=0.1163, beats_loss=0.01363, ecapa_loss=0.0003252, whisper_loss=0.09942, over 22209.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01244, ecapa_loss=0.0003206, whisper_loss=0.1001, over 3880429.20 frames. ], batch size: 92, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:43:51,658 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-09 23:43:53,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=258580.0, ans=0.125 2024-08-09 23:43:56,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=258580.0, ans=0.0 2024-08-09 23:44:00,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=258580.0, ans=0.125 2024-08-09 23:44:12,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258680.0, ans=0.1 2024-08-09 23:44:26,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=258780.0, ans=0.0 2024-08-09 23:44:34,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=258880.0, ans=0.125 2024-08-09 23:44:35,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.33 vs. limit=22.5 2024-08-09 23:44:39,945 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-09 23:44:44,754 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11400, loss[loss=0.09559, beats_loss=0.01563, ecapa_loss=0.000269, whisper_loss=0.07726, over 21681.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01241, ecapa_loss=0.0003181, whisper_loss=0.1011, over 3862877.09 frames. ], batch size: 89, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:44:47,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+01 2.889e+01 3.232e+01 3.833e+01 5.860e+01, threshold=6.464e+01, percent-clipped=0.0 2024-08-09 23:45:12,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=259180.0, ans=0.2 2024-08-09 23:45:50,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=259380.0, ans=0.0 2024-08-09 23:45:58,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11450, loss[loss=0.1105, beats_loss=0.01403, ecapa_loss=0.0003377, whisper_loss=0.09312, over 20966.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01241, ecapa_loss=0.0003193, whisper_loss=0.1018, over 3880007.91 frames. ], batch size: 90, lr: 2.33e-02, grad_scale: 131072.0 2024-08-09 23:46:08,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=259480.0, ans=0.1 2024-08-09 23:46:22,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=259580.0, ans=0.09899494936611666 2024-08-09 23:46:29,732 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-09 23:46:45,115 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 23:46:51,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=259780.0, ans=0.2 2024-08-09 23:46:53,119 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-09 23:47:08,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11500, loss[loss=0.1132, beats_loss=0.009938, ecapa_loss=0.0003221, whisper_loss=0.1, over 13891.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01248, ecapa_loss=0.0003171, whisper_loss=0.1015, over 3892624.76 frames. ], batch size: 56, lr: 2.32e-02, grad_scale: 131072.0 2024-08-09 23:47:08,983 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-09 23:47:10,339 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-09 23:47:11,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.269e+01 3.028e+01 3.430e+01 4.047e+01 6.324e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-09 23:47:18,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=259980.0, ans=0.09899494936611666 2024-08-09 23:47:28,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=260080.0, ans=0.0 2024-08-09 23:47:33,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=260080.0, ans=0.0 2024-08-09 23:47:35,948 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-09 23:47:40,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=260180.0, ans=0.125 2024-08-09 23:47:51,644 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 23:47:54,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=260280.0, ans=0.2 2024-08-09 23:48:00,890 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 32 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-09 23:48:05,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=260380.0, ans=0.125 2024-08-09 23:48:12,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=260380.0, ans=0.125 2024-08-09 23:48:17,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11550, loss[loss=0.1067, beats_loss=0.01473, ecapa_loss=0.0002824, whisper_loss=0.08915, over 18135.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01254, ecapa_loss=0.0003164, whisper_loss=0.1004, over 3852601.30 frames. ], batch size: 71, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:48:18,847 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-09 23:48:23,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-08-09 23:48:24,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=260480.0, ans=0.09899494936611666 2024-08-09 23:48:32,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.91 vs. limit=22.5 2024-08-09 23:48:51,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.58 vs. limit=22.5 2024-08-09 23:48:54,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-08-09 23:49:00,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=260780.0, ans=0.05 2024-08-09 23:49:02,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=260780.0, ans=0.2 2024-08-09 23:49:04,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=260780.0, ans=0.2 2024-08-09 23:49:09,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=260780.0, ans=0.07 2024-08-09 23:49:26,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11600, loss[loss=0.1038, beats_loss=0.01316, ecapa_loss=0.0003412, whisper_loss=0.08722, over 22436.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01255, ecapa_loss=0.0003157, whisper_loss=0.1002, over 3870230.81 frames. ], batch size: 90, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:49:27,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=260980.0, ans=0.02 2024-08-09 23:49:29,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.873e+01 3.365e+01 3.781e+01 5.038e+01, threshold=6.731e+01, percent-clipped=0.0 2024-08-09 23:49:37,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2024-08-09 23:49:47,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=261080.0, ans=0.0 2024-08-09 23:50:24,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-08-09 23:50:34,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=261380.0, ans=0.025 2024-08-09 23:50:37,119 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11650, loss[loss=0.1226, beats_loss=0.01345, ecapa_loss=0.0003132, whisper_loss=0.106, over 17472.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01252, ecapa_loss=0.0003153, whisper_loss=0.1008, over 3872672.79 frames. ], batch size: 69, lr: 2.32e-02, grad_scale: 262144.0 2024-08-09 23:50:41,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2024-08-09 23:51:01,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=261580.0, ans=0.1 2024-08-09 23:51:04,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=261680.0, ans=0.0 2024-08-09 23:51:05,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2024-08-09 23:51:08,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=261680.0, ans=0.1 2024-08-09 23:51:14,310 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-09 23:51:15,737 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-09 23:51:25,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=261780.0, ans=0.1 2024-08-09 23:51:41,917 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-09 23:51:46,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=261980.0, ans=0.125 2024-08-09 23:51:46,945 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11700, loss[loss=0.1358, beats_loss=0.01131, ecapa_loss=0.0003217, whisper_loss=0.1213, over 21393.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01249, ecapa_loss=0.0003178, whisper_loss=0.1013, over 3908857.49 frames. ], batch size: 88, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:51:49,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 3.059e+01 3.535e+01 4.179e+01 1.066e+02, threshold=7.070e+01, percent-clipped=1.0 2024-08-09 23:51:53,867 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-09 23:51:54,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=261980.0, ans=15.0 2024-08-09 23:52:11,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-09 23:52:13,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=262180.0, ans=0.035 2024-08-09 23:52:15,196 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-09 23:52:25,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=262180.0, ans=0.125 2024-08-09 23:52:26,521 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-09 23:52:34,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=262280.0, ans=0.0 2024-08-09 23:52:36,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=262280.0, ans=0.125 2024-08-09 23:52:44,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=262380.0, ans=0.0 2024-08-09 23:52:49,555 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-09 23:52:54,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11750, loss[loss=0.1197, beats_loss=0.01569, ecapa_loss=0.0002956, whisper_loss=0.1011, over 23028.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01265, ecapa_loss=0.0003162, whisper_loss=0.1012, over 3894831.79 frames. ], batch size: 94, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:53:19,978 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.532e-01 2024-08-09 23:53:30,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=262680.0, ans=0.1 2024-08-09 23:53:38,193 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-09 23:53:48,297 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 32 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-09 23:54:00,064 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-09 23:54:02,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11800, loss[loss=0.1337, beats_loss=0.01201, ecapa_loss=0.0002896, whisper_loss=0.1188, over 16604.00 frames. ], tot_loss[loss=0.1173, beats_loss=0.0127, ecapa_loss=0.0003154, whisper_loss=0.1015, over 3908273.42 frames. ], batch size: 61, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:54:05,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 3.014e+01 3.516e+01 4.289e+01 8.691e+01, threshold=7.033e+01, percent-clipped=2.0 2024-08-09 23:54:11,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262980.0, ans=0.0 2024-08-09 23:54:12,763 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-09 23:54:29,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=263180.0, ans=0.5 2024-08-09 23:54:43,799 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-09 23:54:59,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=263380.0, ans=0.2 2024-08-09 23:54:59,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=263380.0, ans=0.125 2024-08-09 23:55:00,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=263380.0, ans=0.0 2024-08-09 23:55:05,637 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 7 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-09 23:55:08,806 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-09 23:55:11,169 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11850, loss[loss=0.1101, beats_loss=0.01687, ecapa_loss=0.0002211, whisper_loss=0.09105, over 20704.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01269, ecapa_loss=0.0003131, whisper_loss=0.1007, over 3915874.17 frames. ], batch size: 81, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:55:37,944 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-09 23:55:38,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=263680.0, ans=0.125 2024-08-09 23:55:39,281 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-09 23:55:46,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=263680.0, ans=0.125 2024-08-09 23:55:59,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=263780.0, ans=0.0 2024-08-09 23:56:03,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=263880.0, ans=0.125 2024-08-09 23:56:05,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263880.0, ans=0.1 2024-08-09 23:56:13,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=263880.0, ans=0.95 2024-08-09 23:56:18,348 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11900, loss[loss=0.09165, beats_loss=0.01364, ecapa_loss=0.0003007, whisper_loss=0.075, over 13138.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01264, ecapa_loss=0.0003121, whisper_loss=0.1013, over 3942667.63 frames. ], batch size: 55, lr: 2.31e-02, grad_scale: 262144.0 2024-08-09 23:56:21,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.968e+01 3.550e+01 4.423e+01 6.843e+01, threshold=7.099e+01, percent-clipped=0.0 2024-08-09 23:56:33,975 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-09 23:56:42,028 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-09 23:56:42,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=264080.0, ans=0.125 2024-08-09 23:56:45,984 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-09 23:57:09,248 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-09 23:57:16,065 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-09 23:57:24,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264380.0, ans=0.1 2024-08-09 23:57:26,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 11950, loss[loss=0.08696, beats_loss=0.01325, ecapa_loss=0.0003727, whisper_loss=0.06999, over 19811.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01269, ecapa_loss=0.0003129, whisper_loss=0.1008, over 3945751.82 frames. ], batch size: 88, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:57:56,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=264680.0, ans=0.2 2024-08-09 23:58:00,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=264680.0, ans=0.2 2024-08-09 23:58:11,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2024-08-09 23:58:28,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=264880.0, ans=0.0 2024-08-09 23:58:32,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-08-09 23:58:35,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12000, loss[loss=0.1008, beats_loss=0.01263, ecapa_loss=0.0003392, whisper_loss=0.08475, over 16537.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01269, ecapa_loss=0.0003123, whisper_loss=0.1002, over 3915021.25 frames. ], batch size: 67, lr: 2.30e-02, grad_scale: 262144.0 2024-08-09 23:58:35,884 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-09 23:59:15,185 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on ASR_libri: loss=0.2807, beats_loss=0, ecapa_loss=0.0009345, whisper_loss=0.2713, over 922467.00 frames. 2024-08-09 23:59:32,520 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on SV_voxceleb1: loss=0.008336, beats_loss=0, ecapa_loss=0.0008336, whisper_loss=0, over 939242.00 frames. 2024-08-10 00:01:27,294 INFO [train_multi_KD3.py:1149] (0/4) Epoch 2, validation on AT_audioset: loss=0.02968, beats_loss=0.02968, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 00:01:27,298 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 00:01:29,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.941e+01 3.442e+01 3.928e+01 6.406e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 00:01:34,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2024-08-10 00:01:47,109 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.554e+00 2024-08-10 00:02:02,352 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 31 from Vox, 21 fro AS 2024-08-10 00:02:05,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=265180.0, ans=0.125 2024-08-10 00:02:11,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-10 00:02:24,910 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 00:02:26,320 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 00:02:37,147 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12050, loss[loss=0.1143, beats_loss=0.01467, ecapa_loss=0.0002752, whisper_loss=0.09692, over 21312.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01264, ecapa_loss=0.0003145, whisper_loss=0.1005, over 3875352.83 frames. ], batch size: 86, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:02:49,495 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 00:02:55,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.32 vs. limit=22.5 2024-08-10 00:02:58,589 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 00:03:03,119 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 00:03:07,284 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 00:03:10,095 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 00:03:31,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265780.0, ans=0.1 2024-08-10 00:03:35,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=265880.0, ans=0.0 2024-08-10 00:03:47,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12100, loss[loss=0.1318, beats_loss=0.009642, ecapa_loss=0.0003152, whisper_loss=0.119, over 21145.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01254, ecapa_loss=0.0003152, whisper_loss=0.101, over 3882828.39 frames. ], batch size: 84, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:03:50,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.134e+01 3.753e+01 4.563e+01 7.245e+01, threshold=7.507e+01, percent-clipped=1.0 2024-08-10 00:03:51,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-08-10 00:03:52,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=265980.0, ans=0.0 2024-08-10 00:03:55,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=265980.0, ans=0.125 2024-08-10 00:04:06,418 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-10 00:04:16,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=266180.0, ans=0.0 2024-08-10 00:04:32,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-08-10 00:04:42,553 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 00:04:46,918 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-10 00:04:50,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=266380.0, ans=0.0 2024-08-10 00:04:51,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=12.0 2024-08-10 00:04:57,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12150, loss[loss=0.09494, beats_loss=0.01298, ecapa_loss=0.0003696, whisper_loss=0.07827, over 17329.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01251, ecapa_loss=0.0003151, whisper_loss=0.1009, over 3862521.61 frames. ], batch size: 72, lr: 2.30e-02, grad_scale: 262144.0 2024-08-10 00:05:03,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.04 vs. limit=15.0 2024-08-10 00:05:04,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=266480.0, ans=0.125 2024-08-10 00:05:08,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=266480.0, ans=0.125 2024-08-10 00:05:15,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=266580.0, ans=0.125 2024-08-10 00:05:17,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=266580.0, ans=0.2 2024-08-10 00:05:32,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=266680.0, ans=0.125 2024-08-10 00:05:32,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=266680.0, ans=0.125 2024-08-10 00:05:43,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=266780.0, ans=0.125 2024-08-10 00:05:46,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=266780.0, ans=0.0 2024-08-10 00:05:54,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=266880.0, ans=0.0 2024-08-10 00:06:08,203 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12200, loss[loss=0.121, beats_loss=0.01197, ecapa_loss=0.0002473, whisper_loss=0.1065, over 16397.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01254, ecapa_loss=0.0003156, whisper_loss=0.1005, over 3845462.51 frames. ], batch size: 61, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:06:11,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.872e+01 3.325e+01 3.813e+01 6.794e+01, threshold=6.650e+01, percent-clipped=0.0 2024-08-10 00:06:23,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=267080.0, ans=0.125 2024-08-10 00:06:27,034 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 00:06:30,856 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 00:06:41,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=267180.0, ans=0.2 2024-08-10 00:06:42,258 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 10 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 00:06:49,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-08-10 00:06:50,601 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 00:07:15,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=267380.0, ans=0.125 2024-08-10 00:07:18,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12250, loss[loss=0.1365, beats_loss=0.01208, ecapa_loss=0.0002574, whisper_loss=0.1218, over 15499.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01258, ecapa_loss=0.0003124, whisper_loss=0.1007, over 3844136.71 frames. ], batch size: 58, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:07:31,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=267580.0, ans=0.125 2024-08-10 00:07:34,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-08-10 00:07:40,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.40 vs. limit=10.0 2024-08-10 00:07:47,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=267680.0, ans=0.2 2024-08-10 00:07:48,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.72 vs. limit=10.0 2024-08-10 00:07:55,950 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 00:08:05,447 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 00:08:25,146 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:08:27,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12300, loss[loss=0.0839, beats_loss=0.01426, ecapa_loss=0.0002526, whisper_loss=0.06712, over 15652.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01266, ecapa_loss=0.0003131, whisper_loss=0.1, over 3862897.78 frames. ], batch size: 61, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:08:30,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.986e+01 3.586e+01 4.164e+01 6.809e+01, threshold=7.172e+01, percent-clipped=1.0 2024-08-10 00:08:37,007 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 18 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-10 00:08:55,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-10 00:08:58,224 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 00:09:02,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=268180.0, ans=10.0 2024-08-10 00:09:10,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=268280.0, ans=0.07 2024-08-10 00:09:18,333 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 00:09:20,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2024-08-10 00:09:36,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12350, loss[loss=0.1356, beats_loss=0.01256, ecapa_loss=0.0003156, whisper_loss=0.1199, over 24019.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01268, ecapa_loss=0.0003149, whisper_loss=0.1004, over 3876949.10 frames. ], batch size: 93, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:09:39,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=20.91 vs. limit=15.0 2024-08-10 00:09:49,332 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 00:09:54,069 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 00:10:08,256 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 00:10:11,312 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 00:10:48,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12400, loss[loss=0.1173, beats_loss=0.01413, ecapa_loss=0.0002645, whisper_loss=0.1006, over 15266.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01263, ecapa_loss=0.0003155, whisper_loss=0.1006, over 3879958.65 frames. ], batch size: 60, lr: 2.29e-02, grad_scale: 262144.0 2024-08-10 00:10:48,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=268980.0, ans=0.0 2024-08-10 00:10:50,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-10 00:10:50,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 2.997e+01 3.426e+01 4.143e+01 8.992e+01, threshold=6.852e+01, percent-clipped=1.0 2024-08-10 00:10:56,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=268980.0, ans=0.125 2024-08-10 00:10:58,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=268980.0, ans=0.2 2024-08-10 00:11:03,641 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 00:11:23,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=269180.0, ans=0.125 2024-08-10 00:11:24,267 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 00:11:31,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269280.0, ans=0.1 2024-08-10 00:11:32,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=269280.0, ans=0.0 2024-08-10 00:11:34,291 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 00:11:36,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.06 vs. limit=10.0 2024-08-10 00:11:41,050 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 00:11:46,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269380.0, ans=0.1 2024-08-10 00:11:57,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=269480.0, ans=0.125 2024-08-10 00:11:58,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12450, loss[loss=0.1175, beats_loss=0.01246, ecapa_loss=0.0002446, whisper_loss=0.1026, over 24197.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01266, ecapa_loss=0.0003141, whisper_loss=0.1004, over 3868931.59 frames. ], batch size: 93, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:12:00,875 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 00:12:01,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=269480.0, ans=0.0 2024-08-10 00:12:10,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-08-10 00:12:21,656 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.49 vs. limit=22.5 2024-08-10 00:12:25,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-10 00:12:33,588 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 00:12:56,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-08-10 00:13:08,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12500, loss[loss=0.09983, beats_loss=0.01159, ecapa_loss=0.0004211, whisper_loss=0.08404, over 19098.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01262, ecapa_loss=0.0003127, whisper_loss=0.1011, over 3902677.64 frames. ], batch size: 84, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:13:11,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.015e+01 3.443e+01 4.080e+01 3.263e+02, threshold=6.886e+01, percent-clipped=2.0 2024-08-10 00:13:25,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270080.0, ans=0.1 2024-08-10 00:13:29,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=270080.0, ans=0.125 2024-08-10 00:13:41,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.93 vs. limit=10.0 2024-08-10 00:13:52,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=270280.0, ans=0.0 2024-08-10 00:14:17,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12550, loss[loss=0.1138, beats_loss=0.01225, ecapa_loss=0.0003796, whisper_loss=0.09774, over 21464.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01267, ecapa_loss=0.0003119, whisper_loss=0.1007, over 3915941.54 frames. ], batch size: 93, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:14:26,260 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 00:14:29,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=270480.0, ans=0.0 2024-08-10 00:14:39,725 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 00:14:50,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.28 vs. limit=15.0 2024-08-10 00:14:51,071 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 00:15:05,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=270780.0, ans=0.2 2024-08-10 00:15:14,938 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 00:15:22,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=270880.0, ans=0.125 2024-08-10 00:15:23,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.53 vs. limit=15.0 2024-08-10 00:15:27,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12600, loss[loss=0.1391, beats_loss=0.009497, ecapa_loss=0.0003023, whisper_loss=0.1266, over 17970.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01258, ecapa_loss=0.0003127, whisper_loss=0.101, over 3895399.32 frames. ], batch size: 68, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:15:30,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 3.077e+01 3.630e+01 3.984e+01 7.187e+01, threshold=7.260e+01, percent-clipped=1.0 2024-08-10 00:15:32,227 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-10 00:15:34,954 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 00:15:36,399 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 00:15:37,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=270980.0, ans=0.0 2024-08-10 00:15:49,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271080.0, ans=0.1 2024-08-10 00:15:50,292 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-10 00:16:15,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=271280.0, ans=0.125 2024-08-10 00:16:18,339 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 00:16:36,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271480.0, ans=0.1 2024-08-10 00:16:37,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12650, loss[loss=0.09587, beats_loss=0.01791, ecapa_loss=0.00022, whisper_loss=0.07575, over 22787.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01263, ecapa_loss=0.0003098, whisper_loss=0.1012, over 3880855.76 frames. ], batch size: 92, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:16:55,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=271580.0, ans=0.125 2024-08-10 00:16:56,440 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 00:16:57,819 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 00:17:10,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=271680.0, ans=0.125 2024-08-10 00:17:25,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=271780.0, ans=0.125 2024-08-10 00:17:35,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=271880.0, ans=0.125 2024-08-10 00:17:47,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12700, loss[loss=0.1318, beats_loss=0.009822, ecapa_loss=0.0003664, whisper_loss=0.1183, over 23404.00 frames. ], tot_loss[loss=0.1174, beats_loss=0.01259, ecapa_loss=0.0003109, whisper_loss=0.1017, over 3892778.63 frames. ], batch size: 95, lr: 2.28e-02, grad_scale: 262144.0 2024-08-10 00:17:50,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 3.012e+01 3.366e+01 3.844e+01 6.101e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 00:18:03,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=272080.0, ans=0.2 2024-08-10 00:18:14,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=272180.0, ans=0.125 2024-08-10 00:18:19,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=272180.0, ans=0.125 2024-08-10 00:18:46,129 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 35 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 00:18:57,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12750, loss[loss=0.1439, beats_loss=0.009462, ecapa_loss=0.0003236, whisper_loss=0.1312, over 19950.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01266, ecapa_loss=0.0003122, whisper_loss=0.1012, over 3895699.06 frames. ], batch size: 78, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:19:16,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=272580.0, ans=0.0 2024-08-10 00:19:26,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=272680.0, ans=0.2 2024-08-10 00:19:33,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=12.0 2024-08-10 00:19:38,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=272780.0, ans=0.0 2024-08-10 00:19:42,940 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-10 00:20:07,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12800, loss[loss=0.1348, beats_loss=0.008843, ecapa_loss=0.0003834, whisper_loss=0.1221, over 15054.00 frames. ], tot_loss[loss=0.1171, beats_loss=0.01262, ecapa_loss=0.0003143, whisper_loss=0.1013, over 3920544.13 frames. ], batch size: 59, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:20:10,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 2.990e+01 3.546e+01 4.142e+01 8.927e+01, threshold=7.091e+01, percent-clipped=1.0 2024-08-10 00:20:14,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2024-08-10 00:20:16,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=272980.0, ans=0.125 2024-08-10 00:20:19,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=272980.0, ans=0.125 2024-08-10 00:20:23,463 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 00:20:25,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-08-10 00:20:30,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=273080.0, ans=0.035 2024-08-10 00:20:34,667 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 00:20:36,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=273180.0, ans=0.125 2024-08-10 00:20:43,001 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 00:20:47,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=273180.0, ans=0.0 2024-08-10 00:20:49,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=273280.0, ans=0.125 2024-08-10 00:21:11,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=273380.0, ans=0.2 2024-08-10 00:21:13,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=273380.0, ans=0.2 2024-08-10 00:21:13,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-08-10 00:21:17,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273480.0, ans=0.1 2024-08-10 00:21:18,430 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12850, loss[loss=0.109, beats_loss=0.01322, ecapa_loss=0.0002764, whisper_loss=0.09303, over 22026.00 frames. ], tot_loss[loss=0.1167, beats_loss=0.01265, ecapa_loss=0.0003139, whisper_loss=0.1009, over 3902792.66 frames. ], batch size: 90, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:21:21,113 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 00:21:21,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=273480.0, ans=0.1 2024-08-10 00:21:21,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.89 vs. limit=22.5 2024-08-10 00:21:25,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=273480.0, ans=0.125 2024-08-10 00:21:40,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-08-10 00:22:03,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=273780.0, ans=0.125 2024-08-10 00:22:13,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=273880.0, ans=0.09899494936611666 2024-08-10 00:22:28,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12900, loss[loss=0.1247, beats_loss=0.0128, ecapa_loss=0.0003036, whisper_loss=0.1088, over 22408.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01265, ecapa_loss=0.0003139, whisper_loss=0.1003, over 3846223.82 frames. ], batch size: 91, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:22:31,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.013e+01 3.364e+01 3.931e+01 6.029e+01, threshold=6.729e+01, percent-clipped=0.0 2024-08-10 00:22:32,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2024-08-10 00:22:41,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=273980.0, ans=0.02 2024-08-10 00:23:05,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=274180.0, ans=0.125 2024-08-10 00:23:06,560 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 00:23:09,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.65 vs. limit=5.0 2024-08-10 00:23:35,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=274380.0, ans=0.0 2024-08-10 00:23:40,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 12950, loss[loss=0.1095, beats_loss=0.009424, ecapa_loss=0.000388, whisper_loss=0.09618, over 15862.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01256, ecapa_loss=0.0003151, whisper_loss=0.1001, over 3832107.79 frames. ], batch size: 65, lr: 2.27e-02, grad_scale: 262144.0 2024-08-10 00:23:47,527 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 00:24:17,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=274680.0, ans=0.0 2024-08-10 00:24:23,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-10 00:24:48,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=274880.0, ans=0.2 2024-08-10 00:24:50,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13000, loss[loss=0.1249, beats_loss=0.009615, ecapa_loss=0.0003075, whisper_loss=0.1122, over 18017.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01242, ecapa_loss=0.0003172, whisper_loss=0.09999, over 3814025.92 frames. ], batch size: 67, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:24:53,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.907e+01 3.154e+01 3.704e+01 5.779e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 00:24:56,181 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 00:24:58,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=274980.0, ans=10.0 2024-08-10 00:25:32,948 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 00:25:43,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2024-08-10 00:26:01,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13050, loss[loss=0.121, beats_loss=0.01099, ecapa_loss=0.0003012, whisper_loss=0.107, over 22887.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01232, ecapa_loss=0.0003174, whisper_loss=0.1004, over 3814342.13 frames. ], batch size: 91, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:26:14,808 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.198e-01 2024-08-10 00:26:22,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=15.0 2024-08-10 00:26:23,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-08-10 00:26:32,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=275680.0, ans=0.07 2024-08-10 00:26:54,826 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 00:27:12,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13100, loss[loss=0.1122, beats_loss=0.01146, ecapa_loss=0.0003636, whisper_loss=0.09711, over 21441.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01228, ecapa_loss=0.0003152, whisper_loss=0.101, over 3839346.32 frames. ], batch size: 87, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:27:14,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.977e+01 3.328e+01 3.884e+01 7.929e+01, threshold=6.656e+01, percent-clipped=3.0 2024-08-10 00:27:33,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=276080.0, ans=0.125 2024-08-10 00:27:42,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=12.0 2024-08-10 00:27:53,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=276280.0, ans=0.125 2024-08-10 00:27:55,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.14 vs. limit=10.0 2024-08-10 00:28:00,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=276280.0, ans=0.0 2024-08-10 00:28:21,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=276380.0, ans=0.0 2024-08-10 00:28:23,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13150, loss[loss=0.138, beats_loss=0.01244, ecapa_loss=0.0002846, whisper_loss=0.1227, over 23085.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01239, ecapa_loss=0.000314, whisper_loss=0.1009, over 3853996.73 frames. ], batch size: 89, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:28:23,585 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 00:28:28,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=276480.0, ans=0.125 2024-08-10 00:28:39,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=276580.0, ans=0.125 2024-08-10 00:28:48,819 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-10 00:28:54,648 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-10 00:29:05,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=276780.0, ans=0.0 2024-08-10 00:29:08,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=276780.0, ans=0.125 2024-08-10 00:29:10,210 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 00:29:15,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=276780.0, ans=0.2 2024-08-10 00:29:17,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=276780.0, ans=0.0 2024-08-10 00:29:32,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=276980.0, ans=0.125 2024-08-10 00:29:33,297 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13200, loss[loss=0.1138, beats_loss=0.01361, ecapa_loss=0.0002554, whisper_loss=0.09767, over 15446.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01245, ecapa_loss=0.0003132, whisper_loss=0.1005, over 3851539.92 frames. ], batch size: 57, lr: 2.26e-02, grad_scale: 262144.0 2024-08-10 00:29:36,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 3.048e+01 3.557e+01 4.616e+01 6.724e+01, threshold=7.115e+01, percent-clipped=1.0 2024-08-10 00:29:36,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=276980.0, ans=0.125 2024-08-10 00:29:37,960 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 00:29:48,878 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 00:29:55,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=277080.0, ans=0.125 2024-08-10 00:30:11,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.44 vs. limit=15.0 2024-08-10 00:30:43,174 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13250, loss[loss=0.1135, beats_loss=0.01404, ecapa_loss=0.0003371, whisper_loss=0.09614, over 14189.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01244, ecapa_loss=0.0003131, whisper_loss=0.09979, over 3800657.47 frames. ], batch size: 58, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:30:43,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=277480.0, ans=0.125 2024-08-10 00:30:47,578 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 00:30:51,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=277480.0, ans=0.0 2024-08-10 00:30:59,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=277580.0, ans=0.125 2024-08-10 00:31:00,763 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 00:31:08,622 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 00:31:19,343 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 00:31:26,466 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 00:31:42,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=277880.0, ans=0.125 2024-08-10 00:31:55,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=277980.0, ans=0.0 2024-08-10 00:31:56,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13300, loss[loss=0.09328, beats_loss=0.01466, ecapa_loss=0.000263, whisper_loss=0.07599, over 16390.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.0125, ecapa_loss=0.0003138, whisper_loss=0.09976, over 3816039.31 frames. ], batch size: 64, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:31:58,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=277980.0, ans=0.0 2024-08-10 00:31:59,763 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.409e+01 2.953e+01 3.236e+01 3.823e+01 6.068e+01, threshold=6.472e+01, percent-clipped=0.0 2024-08-10 00:32:06,529 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 00:32:18,849 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 00:32:22,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278080.0, ans=0.1 2024-08-10 00:32:26,789 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 00:32:56,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=278280.0, ans=0.07 2024-08-10 00:33:00,924 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:33:14,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13350, loss[loss=0.09493, beats_loss=0.01194, ecapa_loss=0.0002815, whisper_loss=0.08018, over 17120.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01253, ecapa_loss=0.000313, whisper_loss=0.0998, over 3794959.04 frames. ], batch size: 65, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:33:14,549 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 11 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 00:33:27,451 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 00:33:27,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278580.0, ans=0.1 2024-08-10 00:33:57,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=278680.0, ans=0.2 2024-08-10 00:34:11,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=278780.0, ans=0.125 2024-08-10 00:34:16,473 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 00:34:31,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13400, loss[loss=0.1044, beats_loss=0.01534, ecapa_loss=0.0002611, whisper_loss=0.08649, over 13946.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01248, ecapa_loss=0.000314, whisper_loss=0.1003, over 3823761.88 frames. ], batch size: 56, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:34:32,042 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 00:34:34,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.868e+01 3.242e+01 3.595e+01 7.666e+01, threshold=6.483e+01, percent-clipped=2.0 2024-08-10 00:34:58,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=279080.0, ans=0.2 2024-08-10 00:35:11,038 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 11 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 00:35:15,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=279180.0, ans=0.125 2024-08-10 00:35:30,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-10 00:35:44,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=279380.0, ans=0.125 2024-08-10 00:35:46,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=279380.0, ans=0.0 2024-08-10 00:35:48,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13450, loss[loss=0.1215, beats_loss=0.01227, ecapa_loss=0.0003575, whisper_loss=0.1057, over 17163.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01257, ecapa_loss=0.0003134, whisper_loss=0.09974, over 3806231.15 frames. ], batch size: 71, lr: 2.25e-02, grad_scale: 262144.0 2024-08-10 00:36:19,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-08-10 00:36:19,935 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 00:36:24,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=279680.0, ans=0.125 2024-08-10 00:36:24,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=279680.0, ans=0.125 2024-08-10 00:37:06,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13500, loss[loss=0.1178, beats_loss=0.01354, ecapa_loss=0.0003147, whisper_loss=0.1011, over 22906.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01246, ecapa_loss=0.0003173, whisper_loss=0.1005, over 3838649.01 frames. ], batch size: 94, lr: 2.24e-02, grad_scale: 262144.0 2024-08-10 00:37:09,044 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-28000.pt 2024-08-10 00:37:13,009 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.053e+01 3.516e+01 4.040e+01 7.643e+01, threshold=7.031e+01, percent-clipped=3.0 2024-08-10 00:37:17,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=279980.0, ans=0.0 2024-08-10 00:37:55,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=280280.0, ans=0.125 2024-08-10 00:37:58,984 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 00:38:01,631 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 00:38:07,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=280280.0, ans=0.125 2024-08-10 00:38:24,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13550, loss[loss=0.1378, beats_loss=0.01076, ecapa_loss=0.0002879, whisper_loss=0.1241, over 20739.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01252, ecapa_loss=0.0003157, whisper_loss=0.101, over 3842968.89 frames. ], batch size: 78, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:38:29,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=280480.0, ans=0.5 2024-08-10 00:38:31,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2024-08-10 00:38:35,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=280480.0, ans=0.07 2024-08-10 00:38:37,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-08-10 00:38:38,426 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 00:38:57,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=280680.0, ans=0.09899494936611666 2024-08-10 00:39:08,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=22.5 2024-08-10 00:39:10,934 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 00:39:18,611 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 25 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-10 00:39:20,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=280780.0, ans=0.2 2024-08-10 00:39:28,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-10 00:39:41,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13600, loss[loss=0.1141, beats_loss=0.01211, ecapa_loss=0.0003172, whisper_loss=0.09885, over 22171.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01254, ecapa_loss=0.000314, whisper_loss=0.1011, over 3879343.21 frames. ], batch size: 89, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:39:42,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=280980.0, ans=10.0 2024-08-10 00:39:43,758 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 00:39:44,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.967e+01 3.461e+01 3.946e+01 7.975e+01, threshold=6.923e+01, percent-clipped=1.0 2024-08-10 00:40:35,951 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 00:40:46,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-10 00:40:46,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2024-08-10 00:40:59,404 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 00:41:00,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13650, loss[loss=0.1316, beats_loss=0.01091, ecapa_loss=0.0002524, whisper_loss=0.1181, over 23468.00 frames. ], tot_loss[loss=0.117, beats_loss=0.01264, ecapa_loss=0.0003107, whisper_loss=0.1012, over 3904678.91 frames. ], batch size: 88, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:41:04,931 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 00:41:33,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=281680.0, ans=0.125 2024-08-10 00:41:54,356 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 00:42:09,636 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 00:42:13,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=281880.0, ans=0.2 2024-08-10 00:42:13,656 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2024-08-10 00:42:17,709 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 00:42:22,452 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13700, loss[loss=0.1388, beats_loss=0.009233, ecapa_loss=0.0003192, whisper_loss=0.1263, over 14767.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01258, ecapa_loss=0.0003121, whisper_loss=0.1011, over 3876034.99 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:42:24,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=281980.0, ans=0.125 2024-08-10 00:42:25,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 2.951e+01 3.261e+01 3.919e+01 6.807e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 00:42:25,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=281980.0, ans=0.125 2024-08-10 00:42:30,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=12.0 2024-08-10 00:42:34,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=281980.0, ans=0.1 2024-08-10 00:42:43,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=282080.0, ans=0.125 2024-08-10 00:42:58,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=282180.0, ans=0.0 2024-08-10 00:43:40,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282380.0, ans=0.1 2024-08-10 00:43:44,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13750, loss[loss=0.1077, beats_loss=0.01555, ecapa_loss=0.0002776, whisper_loss=0.08935, over 17485.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01263, ecapa_loss=0.0003131, whisper_loss=0.1005, over 3849804.66 frames. ], batch size: 72, lr: 2.24e-02, grad_scale: 524288.0 2024-08-10 00:43:50,530 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 00:43:55,761 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.476e-02 2024-08-10 00:44:00,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=282580.0, ans=0.125 2024-08-10 00:44:05,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=282580.0, ans=0.125 2024-08-10 00:44:06,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=282580.0, ans=0.125 2024-08-10 00:44:24,074 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 00:44:30,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=282780.0, ans=0.015 2024-08-10 00:44:32,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=282780.0, ans=0.0 2024-08-10 00:44:35,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-10 00:44:43,520 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 00:44:43,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=282780.0, ans=0.04949747468305833 2024-08-10 00:44:53,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=282880.0, ans=0.0 2024-08-10 00:45:02,121 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13800, loss[loss=0.08803, beats_loss=0.01722, ecapa_loss=0.0002637, whisper_loss=0.06817, over 21424.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01259, ecapa_loss=0.0003112, whisper_loss=0.1003, over 3877872.23 frames. ], batch size: 88, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:45:06,390 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.944e+01 3.294e+01 3.829e+01 5.391e+01, threshold=6.589e+01, percent-clipped=0.0 2024-08-10 00:45:08,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282980.0, ans=0.1 2024-08-10 00:45:12,084 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 00:45:22,951 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 00:45:24,324 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 00:45:28,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=12.0 2024-08-10 00:46:25,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13850, loss[loss=0.1067, beats_loss=0.01401, ecapa_loss=0.0003327, whisper_loss=0.0894, over 21793.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01264, ecapa_loss=0.0003085, whisper_loss=0.1004, over 3887890.08 frames. ], batch size: 93, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:46:29,334 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 00:46:37,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283480.0, ans=0.1 2024-08-10 00:47:06,521 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 00:47:10,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=283680.0, ans=0.125 2024-08-10 00:47:13,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=283780.0, ans=0.125 2024-08-10 00:47:33,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283880.0, ans=0.1 2024-08-10 00:47:36,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=283880.0, ans=0.2 2024-08-10 00:47:38,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2024-08-10 00:47:45,616 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-10 00:47:47,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13900, loss[loss=0.1131, beats_loss=0.01504, ecapa_loss=0.0002695, whisper_loss=0.09538, over 22711.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01256, ecapa_loss=0.0003093, whisper_loss=0.1005, over 3888341.23 frames. ], batch size: 91, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:47:50,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.945e+01 3.348e+01 3.878e+01 5.863e+01, threshold=6.696e+01, percent-clipped=0.0 2024-08-10 00:47:54,664 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-10 00:48:12,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=284080.0, ans=0.0 2024-08-10 00:48:12,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-08-10 00:48:27,418 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 00:48:30,089 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 00:48:41,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=284280.0, ans=0.2 2024-08-10 00:48:45,825 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 29 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-10 00:49:07,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=284380.0, ans=0.125 2024-08-10 00:49:09,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 13950, loss[loss=0.1202, beats_loss=0.01519, ecapa_loss=0.0002562, whisper_loss=0.1024, over 21826.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01249, ecapa_loss=0.0003099, whisper_loss=0.1014, over 3897660.28 frames. ], batch size: 89, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:49:35,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=284580.0, ans=0.0 2024-08-10 00:49:45,504 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 00:49:52,155 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 00:49:56,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=284680.0, ans=0.125 2024-08-10 00:49:57,525 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 00:50:33,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14000, loss[loss=0.1211, beats_loss=0.01064, ecapa_loss=0.0003486, whisper_loss=0.107, over 16945.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01248, ecapa_loss=0.0003098, whisper_loss=0.1013, over 3890295.35 frames. ], batch size: 69, lr: 2.23e-02, grad_scale: 524288.0 2024-08-10 00:50:35,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.957e+01 3.357e+01 3.952e+01 6.248e+01, threshold=6.715e+01, percent-clipped=0.0 2024-08-10 00:50:46,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2024-08-10 00:50:51,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-10 00:50:54,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=285080.0, ans=0.0 2024-08-10 00:50:56,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2024-08-10 00:51:02,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285080.0, ans=0.1 2024-08-10 00:51:10,682 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 00:51:19,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285180.0, ans=0.1 2024-08-10 00:51:21,935 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.883e-02 2024-08-10 00:51:25,103 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 00:51:35,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=285280.0, ans=0.125 2024-08-10 00:51:54,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14050, loss[loss=0.1005, beats_loss=0.01309, ecapa_loss=0.0003259, whisper_loss=0.08415, over 16963.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01243, ecapa_loss=0.0003099, whisper_loss=0.1014, over 3894418.79 frames. ], batch size: 69, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:51:55,860 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 00:52:06,635 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 00:52:16,294 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 00:52:38,515 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 00:52:39,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=285780.0, ans=0.0 2024-08-10 00:52:46,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=285780.0, ans=0.125 2024-08-10 00:52:49,500 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 00:53:11,622 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 00:53:12,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.02 vs. limit=10.0 2024-08-10 00:53:15,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14100, loss[loss=0.1043, beats_loss=0.01487, ecapa_loss=0.0002824, whisper_loss=0.08657, over 18510.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01249, ecapa_loss=0.0003085, whisper_loss=0.1007, over 3880900.47 frames. ], batch size: 75, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:53:18,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 2.998e+01 3.654e+01 4.043e+01 1.341e+02, threshold=7.307e+01, percent-clipped=1.0 2024-08-10 00:53:28,988 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 00:53:30,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=286080.0, ans=0.0 2024-08-10 00:53:47,142 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 00:53:53,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=286180.0, ans=0.05 2024-08-10 00:54:22,822 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 00:54:35,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14150, loss[loss=0.1128, beats_loss=0.01084, ecapa_loss=0.0002949, whisper_loss=0.09899, over 18093.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01251, ecapa_loss=0.000306, whisper_loss=0.1009, over 3875507.29 frames. ], batch size: 70, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:54:43,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=286480.0, ans=0.0 2024-08-10 00:54:50,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=286480.0, ans=0.2 2024-08-10 00:54:52,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=286580.0, ans=0.125 2024-08-10 00:55:15,893 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 00:55:28,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=286780.0, ans=0.0 2024-08-10 00:55:30,815 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 00:55:46,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=286880.0, ans=0.2 2024-08-10 00:55:49,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=286880.0, ans=0.2 2024-08-10 00:55:53,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14200, loss[loss=0.09681, beats_loss=0.01222, ecapa_loss=0.0002854, whisper_loss=0.08173, over 17766.00 frames. ], tot_loss[loss=0.116, beats_loss=0.0125, ecapa_loss=0.0003054, whisper_loss=0.1005, over 3901122.64 frames. ], batch size: 69, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:55:54,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=286980.0, ans=0.0 2024-08-10 00:55:58,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 3.000e+01 3.388e+01 3.894e+01 5.742e+01, threshold=6.776e+01, percent-clipped=0.0 2024-08-10 00:56:05,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2024-08-10 00:56:07,667 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 00:56:11,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-08-10 00:56:18,239 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 00:56:36,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-08-10 00:56:56,984 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 00:57:03,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=287280.0, ans=0.125 2024-08-10 00:57:08,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=287280.0, ans=0.0 2024-08-10 00:57:27,717 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 00:57:38,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14250, loss[loss=0.1198, beats_loss=0.01527, ecapa_loss=0.0003466, whisper_loss=0.101, over 16846.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01257, ecapa_loss=0.0003049, whisper_loss=0.09989, over 3899255.45 frames. ], batch size: 68, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:57:38,723 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 00:57:46,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287480.0, ans=0.1 2024-08-10 00:58:09,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=287580.0, ans=0.0 2024-08-10 00:58:14,684 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 8 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 00:58:17,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=287680.0, ans=0.125 2024-08-10 00:58:39,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-10 00:58:41,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287780.0, ans=0.1 2024-08-10 00:59:00,631 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.033e+00 2024-08-10 00:59:14,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14300, loss[loss=0.1102, beats_loss=0.01335, ecapa_loss=0.0003011, whisper_loss=0.0938, over 21991.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.0126, ecapa_loss=0.000305, whisper_loss=0.09993, over 3889468.45 frames. ], batch size: 89, lr: 2.22e-02, grad_scale: 524288.0 2024-08-10 00:59:14,311 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 00:59:19,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+01 3.147e+01 3.620e+01 4.133e+01 1.421e+02, threshold=7.240e+01, percent-clipped=1.0 2024-08-10 00:59:29,669 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 00:59:32,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=287980.0, ans=0.125 2024-08-10 00:59:45,163 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 01:00:15,134 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 01:00:15,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.99 vs. limit=22.5 2024-08-10 01:00:31,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=288280.0, ans=0.125 2024-08-10 01:01:12,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14350, loss[loss=0.1479, beats_loss=0.009335, ecapa_loss=0.0003283, whisper_loss=0.1353, over 18332.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01261, ecapa_loss=0.0003055, whisper_loss=0.1004, over 3899778.78 frames. ], batch size: 70, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:02:04,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=288680.0, ans=0.1 2024-08-10 01:02:11,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=22.5 2024-08-10 01:02:15,165 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 01:02:26,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288780.0, ans=0.1 2024-08-10 01:02:51,769 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 01:03:02,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=288880.0, ans=0.2 2024-08-10 01:03:03,966 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 01:03:06,525 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 01:03:08,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14400, loss[loss=0.1434, beats_loss=0.01072, ecapa_loss=0.0003483, whisper_loss=0.1292, over 22260.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01258, ecapa_loss=0.0003062, whisper_loss=0.1005, over 3917261.52 frames. ], batch size: 90, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:03:13,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.997e+01 3.365e+01 3.798e+01 7.821e+01, threshold=6.729e+01, percent-clipped=1.0 2024-08-10 01:04:14,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-08-10 01:04:18,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=289280.0, ans=0.0 2024-08-10 01:04:19,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=289280.0, ans=0.0 2024-08-10 01:04:26,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=289280.0, ans=0.125 2024-08-10 01:04:32,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=289380.0, ans=0.07 2024-08-10 01:04:40,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=289380.0, ans=0.2 2024-08-10 01:04:42,482 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 01:04:43,789 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 01:04:45,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 2, batch 14450, loss[loss=0.127, beats_loss=0.01094, ecapa_loss=0.0003002, whisper_loss=0.1131, over 23695.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.0126, ecapa_loss=0.000308, whisper_loss=0.1008, over 3922967.28 frames. ], batch size: 89, lr: 2.21e-02, grad_scale: 524288.0 2024-08-10 01:04:53,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=289480.0, ans=0.1 2024-08-10 01:04:56,077 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-10 01:04:59,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=289580.0, ans=0.125 2024-08-10 01:04:59,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.13 vs. limit=22.5 2024-08-10 01:05:06,073 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 01:05:10,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289580.0, ans=0.1 2024-08-10 01:05:16,195 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 01:05:18,699 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 01:05:22,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289680.0, ans=0.1 2024-08-10 01:05:22,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=289680.0, ans=0.2 2024-08-10 01:05:38,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=289780.0, ans=0.125 2024-08-10 01:05:47,757 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-2.pt 2024-08-10 01:06:23,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 0, loss[loss=0.09066, beats_loss=0.0114, ecapa_loss=0.0003238, whisper_loss=0.07602, over 15879.00 frames. ], tot_loss[loss=0.09066, beats_loss=0.0114, ecapa_loss=0.0003238, whisper_loss=0.07602, over 15879.00 frames. ], batch size: 62, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:06:23,826 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 01:07:07,554 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on ASR_libri: loss=0.2782, beats_loss=0, ecapa_loss=0.0009143, whisper_loss=0.2691, over 922467.00 frames. 2024-08-10 01:07:22,265 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8502, 2.7284, 2.0146, 3.0042], device='cuda:0') 2024-08-10 01:07:23,565 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on SV_voxceleb1: loss=0.008083, beats_loss=0, ecapa_loss=0.0008083, whisper_loss=0, over 939242.00 frames. 2024-08-10 01:09:27,972 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on AT_audioset: loss=0.02889, beats_loss=0.02889, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 01:09:27,976 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 01:09:49,139 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 01:10:02,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 3.015e+01 3.420e+01 3.932e+01 5.377e+01, threshold=6.841e+01, percent-clipped=0.0 2024-08-10 01:10:22,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=289980.0, ans=0.125 2024-08-10 01:10:29,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290080.0, ans=0.1 2024-08-10 01:10:35,655 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 01:10:49,492 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 01:11:36,987 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 01:11:42,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 50, loss[loss=0.1391, beats_loss=0.01079, ecapa_loss=0.0002495, whisper_loss=0.1259, over 18155.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01257, ecapa_loss=0.0003146, whisper_loss=0.1001, over 878518.42 frames. ], batch size: 66, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:11:42,443 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:11:55,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=290380.0, ans=0.0 2024-08-10 01:12:03,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=290380.0, ans=0.2 2024-08-10 01:12:23,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=290480.0, ans=0.2 2024-08-10 01:12:44,896 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 01:13:16,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=290680.0, ans=0.125 2024-08-10 01:13:26,778 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 01:13:27,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-10 01:13:33,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.66 vs. limit=5.0 2024-08-10 01:13:48,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 100, loss[loss=0.09542, beats_loss=0.01473, ecapa_loss=0.0002625, whisper_loss=0.07807, over 15265.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01274, ecapa_loss=0.0003084, whisper_loss=0.09786, over 1516407.12 frames. ], batch size: 62, lr: 2.10e-02, grad_scale: 524288.0 2024-08-10 01:14:07,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=290880.0, ans=0.1 2024-08-10 01:14:12,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=290980.0, ans=0.0 2024-08-10 01:14:18,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.835e+01 4.447e+01 6.801e+01, threshold=7.671e+01, percent-clipped=0.0 2024-08-10 01:14:31,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=15.0 2024-08-10 01:14:56,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=291080.0, ans=0.125 2024-08-10 01:15:03,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2024-08-10 01:15:27,365 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-10 01:15:31,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=291280.0, ans=0.125 2024-08-10 01:15:33,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=291280.0, ans=0.125 2024-08-10 01:15:42,943 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 01:15:43,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-08-10 01:15:44,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 150, loss[loss=0.1301, beats_loss=0.01171, ecapa_loss=0.0003092, whisper_loss=0.1153, over 19196.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01271, ecapa_loss=0.0003003, whisper_loss=0.09872, over 2035579.32 frames. ], batch size: 79, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:16:14,417 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 01:16:39,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=291680.0, ans=0.125 2024-08-10 01:16:43,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.66 vs. limit=10.0 2024-08-10 01:16:54,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=291780.0, ans=0.125 2024-08-10 01:17:11,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 200, loss[loss=0.0925, beats_loss=0.01498, ecapa_loss=0.0002544, whisper_loss=0.07497, over 14773.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01252, ecapa_loss=0.0003012, whisper_loss=0.09907, over 2401893.82 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:17:31,883 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.029e+01 3.361e+01 3.912e+01 9.673e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 01:17:36,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291980.0, ans=0.1 2024-08-10 01:17:56,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=8.0 2024-08-10 01:18:12,886 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 01:18:22,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2024-08-10 01:18:31,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 250, loss[loss=0.09503, beats_loss=0.01309, ecapa_loss=0.0002733, whisper_loss=0.07922, over 17264.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01238, ecapa_loss=0.0002996, whisper_loss=0.09963, over 2715206.64 frames. ], batch size: 69, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:18:48,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=292480.0, ans=0.125 2024-08-10 01:18:51,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=292480.0, ans=0.1 2024-08-10 01:18:51,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=292480.0, ans=0.1 2024-08-10 01:19:03,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-10 01:19:12,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292580.0, ans=0.125 2024-08-10 01:19:14,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=292580.0, ans=0.0 2024-08-10 01:19:30,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=292780.0, ans=10.0 2024-08-10 01:19:47,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 300, loss[loss=0.13, beats_loss=0.0121, ecapa_loss=0.0002956, whisper_loss=0.115, over 23350.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01243, ecapa_loss=0.0002961, whisper_loss=0.0991, over 2960760.22 frames. ], batch size: 93, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:19:51,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=292880.0, ans=0.125 2024-08-10 01:20:05,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=292980.0, ans=0.0 2024-08-10 01:20:06,476 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 3.157e+01 3.521e+01 4.168e+01 6.266e+01, threshold=7.043e+01, percent-clipped=0.0 2024-08-10 01:20:08,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=292980.0, ans=0.0 2024-08-10 01:20:08,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=292980.0, ans=0.2 2024-08-10 01:20:27,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=293080.0, ans=0.125 2024-08-10 01:20:29,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=293080.0, ans=0.125 2024-08-10 01:21:01,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=293280.0, ans=0.125 2024-08-10 01:21:06,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 350, loss[loss=0.1217, beats_loss=0.01065, ecapa_loss=0.0003277, whisper_loss=0.1078, over 21312.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01249, ecapa_loss=0.0002948, whisper_loss=0.09868, over 3149356.00 frames. ], batch size: 84, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:21:37,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=293580.0, ans=0.0 2024-08-10 01:21:57,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-10 01:22:02,897 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.327e+04 2024-08-10 01:22:21,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 400, loss[loss=0.1076, beats_loss=0.01356, ecapa_loss=0.0002927, whisper_loss=0.0911, over 21528.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01234, ecapa_loss=0.0002944, whisper_loss=0.09926, over 3291028.07 frames. ], batch size: 86, lr: 2.09e-02, grad_scale: 524288.0 2024-08-10 01:22:29,413 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 01:22:35,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=293980.0, ans=0.125 2024-08-10 01:22:39,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 2.898e+01 3.177e+01 4.000e+01 8.293e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 01:22:41,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=293980.0, ans=0.2 2024-08-10 01:23:03,552 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 01:23:30,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=294280.0, ans=0.0 2024-08-10 01:23:30,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=294280.0, ans=0.125 2024-08-10 01:23:37,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 450, loss[loss=0.1176, beats_loss=0.01172, ecapa_loss=0.0002772, whisper_loss=0.1031, over 17215.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.0123, ecapa_loss=0.0002931, whisper_loss=0.09939, over 3428004.73 frames. ], batch size: 68, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:23:58,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2024-08-10 01:24:02,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-08-10 01:24:22,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=294680.0, ans=0.0 2024-08-10 01:24:41,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=294780.0, ans=0.125 2024-08-10 01:24:49,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=294780.0, ans=0.125 2024-08-10 01:24:49,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=294780.0, ans=0.0 2024-08-10 01:24:52,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 500, loss[loss=0.1037, beats_loss=0.009262, ecapa_loss=0.0003467, whisper_loss=0.09096, over 18349.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01214, ecapa_loss=0.0002942, whisper_loss=0.09938, over 3502307.96 frames. ], batch size: 73, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:24:55,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294880.0, ans=0.125 2024-08-10 01:25:08,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=294980.0, ans=0.035 2024-08-10 01:25:09,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2024-08-10 01:25:09,601 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.966e+01 3.370e+01 3.826e+01 6.580e+01, threshold=6.739e+01, percent-clipped=1.0 2024-08-10 01:25:15,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=294980.0, ans=0.0 2024-08-10 01:25:27,520 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 01:25:31,846 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 01:25:33,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=295080.0, ans=0.2 2024-08-10 01:25:40,233 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 01:26:05,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 550, loss[loss=0.1045, beats_loss=0.01347, ecapa_loss=0.0002963, whisper_loss=0.08804, over 21462.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01233, ecapa_loss=0.0002908, whisper_loss=0.09817, over 3579165.99 frames. ], batch size: 87, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:26:05,343 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 01:26:07,990 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 01:26:15,366 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 21 from Vox, 14 fro AS 2024-08-10 01:26:24,412 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 01:26:30,382 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.650e+03 2024-08-10 01:26:38,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.51 vs. limit=10.0 2024-08-10 01:27:10,955 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:27:15,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=295780.0, ans=0.0 2024-08-10 01:27:17,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=295780.0, ans=0.2 2024-08-10 01:27:20,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 600, loss[loss=0.1047, beats_loss=0.01244, ecapa_loss=0.0002799, whisper_loss=0.08947, over 14973.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01219, ecapa_loss=0.0002913, whisper_loss=0.09871, over 3631893.21 frames. ], batch size: 57, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:27:26,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2024-08-10 01:27:27,653 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:27:28,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=295880.0, ans=0.0 2024-08-10 01:27:30,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295880.0, ans=0.1 2024-08-10 01:27:34,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=295980.0, ans=0.0 2024-08-10 01:27:38,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.875e+01 3.342e+01 3.961e+01 6.306e+01, threshold=6.685e+01, percent-clipped=0.0 2024-08-10 01:28:02,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=296080.0, ans=0.125 2024-08-10 01:28:04,290 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 01:28:05,481 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 01:28:16,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=296180.0, ans=0.125 2024-08-10 01:28:20,721 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 01:28:22,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=296280.0, ans=0.125 2024-08-10 01:28:28,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2024-08-10 01:28:36,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 650, loss[loss=0.1141, beats_loss=0.009106, ecapa_loss=0.0003039, whisper_loss=0.102, over 21803.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01229, ecapa_loss=0.0002902, whisper_loss=0.09848, over 3683506.71 frames. ], batch size: 85, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:28:42,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296380.0, ans=0.125 2024-08-10 01:28:47,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=296380.0, ans=0.0 2024-08-10 01:28:50,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=296480.0, ans=0.2 2024-08-10 01:28:50,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=296480.0, ans=0.0 2024-08-10 01:28:59,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=296480.0, ans=0.125 2024-08-10 01:29:13,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=296580.0, ans=0.125 2024-08-10 01:29:38,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=296780.0, ans=0.2 2024-08-10 01:29:41,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296780.0, ans=0.125 2024-08-10 01:29:44,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=296780.0, ans=0.1 2024-08-10 01:29:48,947 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 700, loss[loss=0.1163, beats_loss=0.01215, ecapa_loss=0.0003199, whisper_loss=0.101, over 13193.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01227, ecapa_loss=0.0002896, whisper_loss=0.09912, over 3699593.86 frames. ], batch size: 54, lr: 2.08e-02, grad_scale: 524288.0 2024-08-10 01:30:07,534 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 2.824e+01 3.267e+01 4.012e+01 5.256e+01, threshold=6.535e+01, percent-clipped=0.0 2024-08-10 01:30:09,693 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 01:30:15,732 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 01:30:41,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=297180.0, ans=0.0 2024-08-10 01:30:54,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=297280.0, ans=0.125 2024-08-10 01:30:59,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-08-10 01:31:01,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297280.0, ans=0.1 2024-08-10 01:31:02,768 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 11 from Vox, 47 fro AS 2024-08-10 01:31:05,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 750, loss[loss=0.1103, beats_loss=0.01384, ecapa_loss=0.0002554, whisper_loss=0.09394, over 17956.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01225, ecapa_loss=0.0002875, whisper_loss=0.09975, over 3757646.25 frames. ], batch size: 71, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:31:09,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=297380.0, ans=0.125 2024-08-10 01:31:11,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-10 01:31:25,561 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 01:31:32,532 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 01:31:35,229 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 01:31:39,706 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 01:31:41,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=297580.0, ans=0.0 2024-08-10 01:31:51,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2024-08-10 01:31:56,794 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 01:32:14,431 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 01:32:18,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 800, loss[loss=0.1039, beats_loss=0.01037, ecapa_loss=0.0003203, whisper_loss=0.09036, over 16967.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01225, ecapa_loss=0.0002844, whisper_loss=0.09954, over 3786446.91 frames. ], batch size: 69, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:32:26,732 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-10 01:32:35,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 2.843e+01 3.241e+01 3.911e+01 6.650e+01, threshold=6.482e+01, percent-clipped=1.0 2024-08-10 01:32:36,073 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 01:32:39,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=297980.0, ans=0.125 2024-08-10 01:32:44,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=297980.0, ans=0.125 2024-08-10 01:33:22,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298280.0, ans=0.1 2024-08-10 01:33:31,857 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 01:33:32,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-10 01:33:33,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 850, loss[loss=0.1393, beats_loss=0.01095, ecapa_loss=0.0003269, whisper_loss=0.1251, over 22854.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01224, ecapa_loss=0.0002856, whisper_loss=0.09937, over 3808471.96 frames. ], batch size: 90, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:33:40,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2024-08-10 01:33:42,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=298380.0, ans=0.125 2024-08-10 01:34:02,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=298580.0, ans=0.125 2024-08-10 01:34:06,581 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 01:34:12,297 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 01:34:16,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2024-08-10 01:34:36,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=298780.0, ans=0.07 2024-08-10 01:34:41,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=298780.0, ans=0.125 2024-08-10 01:34:43,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=298780.0, ans=0.2 2024-08-10 01:34:48,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 900, loss[loss=0.09807, beats_loss=0.01302, ecapa_loss=0.0002956, whisper_loss=0.08209, over 21524.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01221, ecapa_loss=0.0002867, whisper_loss=0.09955, over 3831297.21 frames. ], batch size: 88, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:34:52,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298880.0, ans=0.1 2024-08-10 01:34:54,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=298880.0, ans=0.07 2024-08-10 01:34:55,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=298880.0, ans=10.0 2024-08-10 01:34:57,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.85 vs. limit=15.0 2024-08-10 01:35:03,646 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 01:35:06,181 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.811e+01 3.274e+01 3.784e+01 5.899e+01, threshold=6.548e+01, percent-clipped=0.0 2024-08-10 01:35:09,417 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 01:35:19,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=299080.0, ans=0.125 2024-08-10 01:35:47,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=299280.0, ans=0.1 2024-08-10 01:35:51,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=299280.0, ans=0.125 2024-08-10 01:35:53,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299280.0, ans=0.1 2024-08-10 01:35:59,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=299280.0, ans=0.125 2024-08-10 01:36:03,260 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 950, loss[loss=0.09866, beats_loss=0.01514, ecapa_loss=0.0003148, whisper_loss=0.08038, over 19829.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01226, ecapa_loss=0.0002849, whisper_loss=0.09874, over 3811678.88 frames. ], batch size: 83, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:36:06,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=299380.0, ans=0.0 2024-08-10 01:36:06,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=299380.0, ans=0.1 2024-08-10 01:36:09,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=299380.0, ans=0.125 2024-08-10 01:36:16,806 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 01:36:18,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=299480.0, ans=0.125 2024-08-10 01:36:34,014 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 01:37:00,051 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 01:37:07,239 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 01:37:07,550 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.781e-02 2024-08-10 01:37:17,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=299880.0, ans=0.125 2024-08-10 01:37:18,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1000, loss[loss=0.1109, beats_loss=0.01188, ecapa_loss=0.0003249, whisper_loss=0.09573, over 16692.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01233, ecapa_loss=0.0002835, whisper_loss=0.09823, over 3810628.47 frames. ], batch size: 71, lr: 2.07e-02, grad_scale: 524288.0 2024-08-10 01:37:18,963 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 01:37:37,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+01 2.926e+01 3.322e+01 3.689e+01 5.712e+01, threshold=6.643e+01, percent-clipped=0.0 2024-08-10 01:37:38,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=299980.0, ans=0.125 2024-08-10 01:37:52,653 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 01:37:57,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2024-08-10 01:38:07,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=300180.0, ans=0.125 2024-08-10 01:38:24,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300280.0, ans=0.1 2024-08-10 01:38:34,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1050, loss[loss=0.1057, beats_loss=0.01328, ecapa_loss=0.0002464, whisper_loss=0.08993, over 20691.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01233, ecapa_loss=0.0002818, whisper_loss=0.09861, over 3816386.63 frames. ], batch size: 82, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:39:02,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=300480.0, ans=0.0 2024-08-10 01:39:21,778 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 01:39:24,692 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 01:39:50,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1100, loss[loss=0.122, beats_loss=0.01295, ecapa_loss=0.0002186, whisper_loss=0.1069, over 17091.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.0123, ecapa_loss=0.0002795, whisper_loss=0.09821, over 3770198.47 frames. ], batch size: 65, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:40:07,327 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 01:40:07,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-10 01:40:08,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.873e+01 3.261e+01 3.724e+01 5.464e+01, threshold=6.522e+01, percent-clipped=0.0 2024-08-10 01:40:14,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-10 01:40:16,081 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 01:40:37,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.64 vs. limit=15.0 2024-08-10 01:40:39,820 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-10 01:40:47,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=301180.0, ans=0.125 2024-08-10 01:40:58,908 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 01:41:04,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1150, loss[loss=0.09632, beats_loss=0.01181, ecapa_loss=0.0002764, whisper_loss=0.08174, over 19468.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01233, ecapa_loss=0.0002794, whisper_loss=0.09813, over 3798404.24 frames. ], batch size: 76, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:41:13,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=301380.0, ans=0.0 2024-08-10 01:41:16,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=301380.0, ans=0.035 2024-08-10 01:41:20,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=301480.0, ans=0.0 2024-08-10 01:41:21,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2024-08-10 01:41:35,378 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 01:41:38,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=301580.0, ans=0.125 2024-08-10 01:41:49,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=301680.0, ans=0.125 2024-08-10 01:41:49,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2024-08-10 01:42:10,153 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 01:42:12,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-10 01:42:12,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=301780.0, ans=0.05 2024-08-10 01:42:17,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-08-10 01:42:19,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1200, loss[loss=0.09874, beats_loss=0.01392, ecapa_loss=0.000272, whisper_loss=0.0821, over 21278.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.0124, ecapa_loss=0.0002796, whisper_loss=0.09811, over 3806589.97 frames. ], batch size: 85, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:42:36,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.802e+01 3.225e+01 3.750e+01 6.302e+01, threshold=6.450e+01, percent-clipped=0.0 2024-08-10 01:42:51,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-10 01:42:57,307 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:42:59,633 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 01:43:19,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=302280.0, ans=0.09899494936611666 2024-08-10 01:43:33,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1250, loss[loss=0.1502, beats_loss=0.009262, ecapa_loss=0.0002738, whisper_loss=0.1382, over 23858.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01229, ecapa_loss=0.000279, whisper_loss=0.09804, over 3803217.92 frames. ], batch size: 86, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:43:37,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-10 01:44:25,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-10 01:44:31,630 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 01:44:39,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=302780.0, ans=0.125 2024-08-10 01:44:48,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1300, loss[loss=0.1215, beats_loss=0.01109, ecapa_loss=0.0002872, whisper_loss=0.1076, over 19794.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01224, ecapa_loss=0.0002815, whisper_loss=0.09851, over 3803647.68 frames. ], batch size: 78, lr: 2.06e-02, grad_scale: 1048576.0 2024-08-10 01:45:03,969 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 01:45:08,212 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.884e+01 3.264e+01 3.595e+01 5.329e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 01:45:18,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=302980.0, ans=0.025 2024-08-10 01:45:25,303 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 01:45:45,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=303180.0, ans=0.125 2024-08-10 01:46:05,351 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 01:46:10,189 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1350, loss[loss=0.1275, beats_loss=0.0128, ecapa_loss=0.0002494, whisper_loss=0.1122, over 24591.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01223, ecapa_loss=0.0002811, whisper_loss=0.09838, over 3811255.20 frames. ], batch size: 93, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:46:10,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=303380.0, ans=0.07 2024-08-10 01:46:10,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=303380.0, ans=0.125 2024-08-10 01:46:21,071 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-10 01:46:21,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=303380.0, ans=0.125 2024-08-10 01:46:30,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=303480.0, ans=0.2 2024-08-10 01:46:38,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=12.0 2024-08-10 01:46:43,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303580.0, ans=0.0 2024-08-10 01:46:43,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=303580.0, ans=0.1 2024-08-10 01:46:44,527 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 01:46:47,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=303580.0, ans=0.0 2024-08-10 01:47:05,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-08-10 01:47:26,947 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1400, loss[loss=0.1135, beats_loss=0.01274, ecapa_loss=0.0002734, whisper_loss=0.09808, over 18018.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01222, ecapa_loss=0.0002808, whisper_loss=0.09821, over 3793295.21 frames. ], batch size: 74, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:47:35,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303880.0, ans=0.1 2024-08-10 01:47:42,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-08-10 01:47:44,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.877e+01 3.100e+01 3.641e+01 7.400e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-10 01:48:18,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=304180.0, ans=0.125 2024-08-10 01:48:24,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2024-08-10 01:48:25,220 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 01:48:28,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=304280.0, ans=0.125 2024-08-10 01:48:38,192 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 01:49:10,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1450, loss[loss=0.1309, beats_loss=0.01249, ecapa_loss=0.0002831, whisper_loss=0.1156, over 23344.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01233, ecapa_loss=0.0002801, whisper_loss=0.09793, over 3791024.54 frames. ], batch size: 91, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:49:19,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304380.0, ans=0.1 2024-08-10 01:49:28,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=304480.0, ans=0.125 2024-08-10 01:49:30,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=304480.0, ans=0.125 2024-08-10 01:49:31,514 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 01:49:40,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=304480.0, ans=0.125 2024-08-10 01:49:42,175 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 01:49:47,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.96 vs. limit=15.0 2024-08-10 01:49:47,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=12.0 2024-08-10 01:49:48,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=304580.0, ans=0.1 2024-08-10 01:49:53,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.19 vs. limit=10.0 2024-08-10 01:50:00,603 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 01:50:07,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=304680.0, ans=0.125 2024-08-10 01:50:13,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=304780.0, ans=0.125 2024-08-10 01:50:30,195 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1500, loss[loss=0.09177, beats_loss=0.01711, ecapa_loss=0.000223, whisper_loss=0.07243, over 19675.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01254, ecapa_loss=0.0002774, whisper_loss=0.09694, over 3780190.72 frames. ], batch size: 79, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:50:45,536 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 01:50:49,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.825e+01 3.192e+01 3.755e+01 6.662e+01, threshold=6.384e+01, percent-clipped=1.0 2024-08-10 01:51:09,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2024-08-10 01:51:15,477 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 01:51:24,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305180.0, ans=0.1 2024-08-10 01:51:26,110 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 01:51:31,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.83 vs. limit=15.0 2024-08-10 01:51:48,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1550, loss[loss=0.08862, beats_loss=0.01497, ecapa_loss=0.0002734, whisper_loss=0.07091, over 21721.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01248, ecapa_loss=0.0002788, whisper_loss=0.09693, over 3751991.24 frames. ], batch size: 92, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:51:53,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=305380.0, ans=0.05 2024-08-10 01:51:57,686 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 01:52:03,843 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 44 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 01:52:10,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=305480.0, ans=0.09899494936611666 2024-08-10 01:52:26,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=305580.0, ans=0.07 2024-08-10 01:52:27,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=305580.0, ans=0.125 2024-08-10 01:52:38,864 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 01:52:40,839 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 01:53:00,360 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 01:53:07,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1600, loss[loss=0.147, beats_loss=0.01007, ecapa_loss=0.0003174, whisper_loss=0.1338, over 23696.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01238, ecapa_loss=0.0002801, whisper_loss=0.09803, over 3765037.13 frames. ], batch size: 93, lr: 2.05e-02, grad_scale: 1048576.0 2024-08-10 01:53:20,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=305880.0, ans=0.2 2024-08-10 01:53:27,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=305980.0, ans=0.2 2024-08-10 01:53:27,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.961e+01 3.443e+01 4.067e+01 6.226e+01, threshold=6.887e+01, percent-clipped=0.0 2024-08-10 01:53:28,160 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 01:54:07,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=306180.0, ans=0.0 2024-08-10 01:54:08,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2024-08-10 01:54:15,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=306280.0, ans=0.2 2024-08-10 01:54:15,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=15.0 2024-08-10 01:54:26,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1650, loss[loss=0.1307, beats_loss=0.0108, ecapa_loss=0.0002442, whisper_loss=0.1175, over 15457.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01231, ecapa_loss=0.0002797, whisper_loss=0.09877, over 3824129.28 frames. ], batch size: 55, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:54:30,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=306380.0, ans=0.0 2024-08-10 01:54:37,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=306380.0, ans=0.125 2024-08-10 01:55:01,870 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 01:55:04,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=306580.0, ans=0.0 2024-08-10 01:55:23,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=306680.0, ans=0.0 2024-08-10 01:55:26,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=306680.0, ans=0.0 2024-08-10 01:55:40,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=306780.0, ans=0.125 2024-08-10 01:55:42,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=306880.0, ans=0.125 2024-08-10 01:55:43,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1700, loss[loss=0.1229, beats_loss=0.01369, ecapa_loss=0.0002605, whisper_loss=0.1066, over 23027.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01224, ecapa_loss=0.0002804, whisper_loss=0.09941, over 3808244.29 frames. ], batch size: 92, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:56:01,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.006e+01 3.281e+01 3.850e+01 2.955e+02, threshold=6.563e+01, percent-clipped=2.0 2024-08-10 01:56:11,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=306980.0, ans=0.125 2024-08-10 01:56:14,228 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-10 01:56:21,342 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 01:56:21,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=307080.0, ans=0.0 2024-08-10 01:56:57,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1750, loss[loss=0.1347, beats_loss=0.01057, ecapa_loss=0.0002678, whisper_loss=0.1214, over 19495.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.0123, ecapa_loss=0.0002766, whisper_loss=0.09885, over 3849842.66 frames. ], batch size: 76, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:57:39,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=307580.0, ans=0.125 2024-08-10 01:57:46,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=307680.0, ans=0.125 2024-08-10 01:57:50,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=307680.0, ans=0.125 2024-08-10 01:57:55,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=307780.0, ans=0.07 2024-08-10 01:58:09,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1800, loss[loss=0.09903, beats_loss=0.0117, ecapa_loss=0.0003166, whisper_loss=0.08417, over 16760.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01222, ecapa_loss=0.0002776, whisper_loss=0.09846, over 3831119.78 frames. ], batch size: 66, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:58:12,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=307880.0, ans=0.125 2024-08-10 01:58:13,944 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 01:58:17,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2024-08-10 01:58:26,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.751e+01 3.157e+01 3.582e+01 5.631e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 01:58:42,454 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 01:58:48,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2024-08-10 01:59:07,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=308280.0, ans=0.2 2024-08-10 01:59:20,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1850, loss[loss=0.1416, beats_loss=0.009267, ecapa_loss=0.0002877, whisper_loss=0.1295, over 21089.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.0122, ecapa_loss=0.000279, whisper_loss=0.09924, over 3836635.21 frames. ], batch size: 77, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 01:59:32,033 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 01:59:34,879 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 31 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 01:59:35,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=308480.0, ans=0.0 2024-08-10 02:00:04,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=308680.0, ans=0.125 2024-08-10 02:00:06,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308680.0, ans=0.1 2024-08-10 02:00:29,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=308880.0, ans=0.125 2024-08-10 02:00:30,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1900, loss[loss=0.1252, beats_loss=0.01325, ecapa_loss=0.0002718, whisper_loss=0.1092, over 19608.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01227, ecapa_loss=0.0002843, whisper_loss=0.09845, over 3822387.17 frames. ], batch size: 76, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:00:46,684 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 02:00:46,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=308980.0, ans=0.125 2024-08-10 02:00:47,774 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.250e+01 2.899e+01 3.416e+01 4.271e+01 7.702e+01, threshold=6.832e+01, percent-clipped=2.0 2024-08-10 02:00:55,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=308980.0, ans=0.125 2024-08-10 02:00:57,660 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 02:01:00,121 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 02:01:09,987 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 02:01:16,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=309180.0, ans=0.0 2024-08-10 02:01:27,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=309280.0, ans=0.2 2024-08-10 02:01:38,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.17 vs. limit=22.5 2024-08-10 02:01:39,570 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 1950, loss[loss=0.09136, beats_loss=0.01537, ecapa_loss=0.0003023, whisper_loss=0.07297, over 21526.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01233, ecapa_loss=0.0002898, whisper_loss=0.09867, over 3823768.79 frames. ], batch size: 90, lr: 2.04e-02, grad_scale: 1048576.0 2024-08-10 02:01:41,085 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 02:01:51,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2024-08-10 02:01:52,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=309480.0, ans=0.125 2024-08-10 02:01:59,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=309480.0, ans=0.07 2024-08-10 02:02:11,166 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 02:02:15,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.17 vs. limit=22.5 2024-08-10 02:02:16,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=309580.0, ans=0.2 2024-08-10 02:02:32,757 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-10 02:02:37,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=309780.0, ans=0.0 2024-08-10 02:02:38,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=309780.0, ans=0.0 2024-08-10 02:02:51,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2000, loss[loss=0.1009, beats_loss=0.01433, ecapa_loss=0.0003144, whisper_loss=0.08348, over 20081.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01226, ecapa_loss=0.0002956, whisper_loss=0.09881, over 3812974.28 frames. ], batch size: 83, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:02:53,007 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 02:03:07,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=309980.0, ans=10.0 2024-08-10 02:03:09,442 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.384e+01 2.983e+01 3.552e+01 3.984e+01 6.262e+01, threshold=7.103e+01, percent-clipped=0.0 2024-08-10 02:03:14,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=309980.0, ans=0.125 2024-08-10 02:03:22,388 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 02:03:23,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=310080.0, ans=0.0 2024-08-10 02:03:29,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.16 vs. limit=15.0 2024-08-10 02:03:44,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-10 02:03:47,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=310180.0, ans=0.125 2024-08-10 02:04:02,473 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 02:04:03,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2050, loss[loss=0.1239, beats_loss=0.01392, ecapa_loss=0.0002708, whisper_loss=0.1073, over 22363.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.0123, ecapa_loss=0.0002962, whisper_loss=0.09912, over 3813353.53 frames. ], batch size: 88, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:04:09,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=310380.0, ans=0.2 2024-08-10 02:04:16,500 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.726e-01 2024-08-10 02:04:19,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=310480.0, ans=0.2 2024-08-10 02:04:32,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.03 vs. limit=10.0 2024-08-10 02:04:53,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=310680.0, ans=0.2 2024-08-10 02:04:55,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.45 vs. limit=6.0 2024-08-10 02:04:58,216 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 02:05:02,541 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 02:05:09,360 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 02:05:11,857 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 02:05:13,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2100, loss[loss=0.1042, beats_loss=0.01403, ecapa_loss=0.0003594, whisper_loss=0.08658, over 19677.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01238, ecapa_loss=0.0002976, whisper_loss=0.09862, over 3802821.73 frames. ], batch size: 84, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:05:16,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=310880.0, ans=0.0 2024-08-10 02:05:21,567 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 02:05:29,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.901e+01 3.264e+01 3.705e+01 5.595e+01, threshold=6.528e+01, percent-clipped=0.0 2024-08-10 02:05:31,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310980.0, ans=0.1 2024-08-10 02:05:49,187 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 02:06:07,907 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 02:06:15,095 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-10 02:06:19,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=311280.0, ans=0.125 2024-08-10 02:06:20,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=311280.0, ans=0.125 2024-08-10 02:06:21,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=311280.0, ans=10.0 2024-08-10 02:06:22,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=311380.0, ans=0.0 2024-08-10 02:06:23,174 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2150, loss[loss=0.09685, beats_loss=0.01245, ecapa_loss=0.0003388, whisper_loss=0.08102, over 15770.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01239, ecapa_loss=0.0002984, whisper_loss=0.09887, over 3812916.92 frames. ], batch size: 63, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:06:27,694 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 02:06:32,494 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-10 02:06:34,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=311380.0, ans=0.0 2024-08-10 02:06:40,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2024-08-10 02:06:51,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-10 02:07:13,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-10 02:07:22,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311780.0, ans=0.1 2024-08-10 02:07:32,681 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-10 02:07:37,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=311880.0, ans=0.125 2024-08-10 02:07:38,363 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2200, loss[loss=0.105, beats_loss=0.01435, ecapa_loss=0.0002645, whisper_loss=0.08796, over 19940.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01239, ecapa_loss=0.0002978, whisper_loss=0.09893, over 3796150.96 frames. ], batch size: 79, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:07:41,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=311880.0, ans=0.0 2024-08-10 02:07:51,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=311980.0, ans=0.125 2024-08-10 02:07:51,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=15.0 2024-08-10 02:07:55,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.913e+01 3.407e+01 3.904e+01 7.612e+01, threshold=6.814e+01, percent-clipped=1.0 2024-08-10 02:08:06,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=312080.0, ans=0.125 2024-08-10 02:08:13,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=312080.0, ans=0.1 2024-08-10 02:08:14,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=312080.0, ans=0.0 2024-08-10 02:08:23,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=312180.0, ans=0.2 2024-08-10 02:08:26,020 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 02:08:34,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=312180.0, ans=0.2 2024-08-10 02:08:43,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=312280.0, ans=0.125 2024-08-10 02:08:47,768 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 02:08:50,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2250, loss[loss=0.1279, beats_loss=0.01018, ecapa_loss=0.0003131, whisper_loss=0.1146, over 23589.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01237, ecapa_loss=0.0002972, whisper_loss=0.09991, over 3831239.92 frames. ], batch size: 90, lr: 2.03e-02, grad_scale: 1048576.0 2024-08-10 02:09:10,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=312480.0, ans=0.125 2024-08-10 02:09:20,541 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 02:09:22,079 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 02:09:23,360 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 02:09:25,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-08-10 02:09:27,990 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 02:09:33,527 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 02:10:03,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2300, loss[loss=0.127, beats_loss=0.01403, ecapa_loss=0.0003024, whisper_loss=0.11, over 19155.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01236, ecapa_loss=0.0002987, whisper_loss=0.09963, over 3834943.66 frames. ], batch size: 78, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:10:21,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 3.060e+01 3.416e+01 3.893e+01 7.548e+01, threshold=6.833e+01, percent-clipped=2.0 2024-08-10 02:10:41,198 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 02:10:42,850 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 02:10:44,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=313080.0, ans=0.125 2024-08-10 02:10:44,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=313080.0, ans=0.125 2024-08-10 02:10:54,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.20 vs. limit=10.0 2024-08-10 02:10:56,208 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 02:10:57,739 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 02:11:14,769 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2350, loss[loss=0.124, beats_loss=0.0111, ecapa_loss=0.0003185, whisper_loss=0.1098, over 21597.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01241, ecapa_loss=0.0002973, whisper_loss=0.09935, over 3835598.47 frames. ], batch size: 84, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:11:19,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313380.0, ans=0.1 2024-08-10 02:11:21,851 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 02:11:25,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=313380.0, ans=0.125 2024-08-10 02:11:37,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313480.0, ans=0.1 2024-08-10 02:11:38,939 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 02:11:40,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2024-08-10 02:12:04,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=313680.0, ans=0.0 2024-08-10 02:12:09,401 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 02:12:11,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=313680.0, ans=0.125 2024-08-10 02:12:28,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2400, loss[loss=0.1039, beats_loss=0.01717, ecapa_loss=0.0001734, whisper_loss=0.08502, over 14576.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.0123, ecapa_loss=0.0002991, whisper_loss=0.1, over 3813699.51 frames. ], batch size: 54, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:12:33,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-08-10 02:12:44,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.008e+01 3.355e+01 4.317e+01 6.888e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 02:13:05,037 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 02:13:30,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=314280.0, ans=0.125 2024-08-10 02:13:40,414 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2450, loss[loss=0.1248, beats_loss=0.009938, ecapa_loss=0.0003044, whisper_loss=0.1118, over 18117.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01226, ecapa_loss=0.0002982, whisper_loss=0.1002, over 3859204.18 frames. ], batch size: 73, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:13:46,684 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 02:13:47,863 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 02:13:48,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2024-08-10 02:13:52,348 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 12 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 02:13:57,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=314480.0, ans=0.0 2024-08-10 02:13:57,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=314480.0, ans=0.0 2024-08-10 02:14:00,798 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 02:14:11,957 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 02:14:13,510 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 02:14:16,871 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 02:14:17,902 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 02:14:40,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-08-10 02:14:44,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=314780.0, ans=0.125 2024-08-10 02:14:54,317 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2500, loss[loss=0.1255, beats_loss=0.01023, ecapa_loss=0.000292, whisper_loss=0.1124, over 18702.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01224, ecapa_loss=0.0002997, whisper_loss=0.1002, over 3847090.09 frames. ], batch size: 73, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:15:12,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.053e+01 3.458e+01 4.005e+01 5.985e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 02:15:19,834 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 02:15:30,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=315080.0, ans=0.125 2024-08-10 02:15:30,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=315080.0, ans=0.2 2024-08-10 02:15:33,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=315080.0, ans=0.125 2024-08-10 02:15:50,551 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 02:15:56,612 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 02:16:05,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=12.0 2024-08-10 02:16:07,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2550, loss[loss=0.1198, beats_loss=0.01206, ecapa_loss=0.0002913, whisper_loss=0.1049, over 22579.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01238, ecapa_loss=0.0002968, whisper_loss=0.09919, over 3856911.37 frames. ], batch size: 93, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:16:28,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=315480.0, ans=0.0 2024-08-10 02:16:30,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=315480.0, ans=0.0 2024-08-10 02:16:50,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-08-10 02:16:51,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2024-08-10 02:17:13,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.24 vs. limit=10.0 2024-08-10 02:17:20,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2600, loss[loss=0.1086, beats_loss=0.01218, ecapa_loss=0.000328, whisper_loss=0.09312, over 20867.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01238, ecapa_loss=0.0002954, whisper_loss=0.09866, over 3863684.61 frames. ], batch size: 84, lr: 2.02e-02, grad_scale: 1048576.0 2024-08-10 02:17:38,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+01 2.748e+01 3.170e+01 3.706e+01 6.461e+01, threshold=6.341e+01, percent-clipped=0.0 2024-08-10 02:17:39,731 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 02:17:45,951 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 02:17:56,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=316080.0, ans=10.0 2024-08-10 02:18:01,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=316080.0, ans=0.125 2024-08-10 02:18:12,418 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 02:18:28,746 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-10 02:18:38,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2650, loss[loss=0.1143, beats_loss=0.01213, ecapa_loss=0.0002979, whisper_loss=0.09924, over 21980.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01243, ecapa_loss=0.0002987, whisper_loss=0.09872, over 3903350.56 frames. ], batch size: 86, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:18:47,619 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 02:18:51,766 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 02:19:48,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2024-08-10 02:19:54,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2700, loss[loss=0.09758, beats_loss=0.01556, ecapa_loss=0.0002969, whisper_loss=0.07905, over 17286.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01246, ecapa_loss=0.0002974, whisper_loss=0.09897, over 3908987.35 frames. ], batch size: 66, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:19:54,571 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 02:20:03,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=316880.0, ans=0.1 2024-08-10 02:20:11,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.947e+01 3.317e+01 3.968e+01 5.790e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 02:20:40,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=317180.0, ans=0.125 2024-08-10 02:20:40,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=317180.0, ans=0.125 2024-08-10 02:21:03,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=317280.0, ans=0.125 2024-08-10 02:21:07,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2750, loss[loss=0.1199, beats_loss=0.0137, ecapa_loss=0.0002542, whisper_loss=0.1037, over 16864.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01248, ecapa_loss=0.0002975, whisper_loss=0.09862, over 3902097.81 frames. ], batch size: 60, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:21:10,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-10 02:21:16,992 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 02:21:34,993 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.290e-01 2024-08-10 02:21:52,352 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 02:21:57,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=15.0 2024-08-10 02:22:02,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317680.0, ans=0.1 2024-08-10 02:22:11,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317780.0, ans=0.1 2024-08-10 02:22:14,544 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 02:22:18,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=317780.0, ans=0.125 2024-08-10 02:22:24,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2800, loss[loss=0.1288, beats_loss=0.01261, ecapa_loss=0.0003017, whisper_loss=0.1132, over 19704.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.0125, ecapa_loss=0.0002961, whisper_loss=0.09862, over 3888739.80 frames. ], batch size: 77, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:22:40,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=317980.0, ans=0.5 2024-08-10 02:22:43,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.037e+01 3.440e+01 4.229e+01 1.125e+02, threshold=6.879e+01, percent-clipped=1.0 2024-08-10 02:22:52,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=317980.0, ans=0.0 2024-08-10 02:22:54,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=318080.0, ans=0.025 2024-08-10 02:23:09,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=318180.0, ans=0.05 2024-08-10 02:23:14,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.33 vs. limit=22.5 2024-08-10 02:23:16,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=318180.0, ans=0.0 2024-08-10 02:23:23,900 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 02:23:34,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=318280.0, ans=0.0 2024-08-10 02:23:38,110 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 02:23:39,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2850, loss[loss=0.1202, beats_loss=0.01225, ecapa_loss=0.0003476, whisper_loss=0.1044, over 21138.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.0125, ecapa_loss=0.0002967, whisper_loss=0.09896, over 3893389.36 frames. ], batch size: 89, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:23:51,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=318380.0, ans=0.05 2024-08-10 02:24:20,523 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-10 02:24:28,220 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 02:24:39,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=318680.0, ans=6.0 2024-08-10 02:24:43,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-08-10 02:25:01,638 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2900, loss[loss=0.1068, beats_loss=0.01102, ecapa_loss=0.0003278, whisper_loss=0.09251, over 16561.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01251, ecapa_loss=0.0002974, whisper_loss=0.09883, over 3899850.02 frames. ], batch size: 66, lr: 2.01e-02, grad_scale: 1048576.0 2024-08-10 02:25:02,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318880.0, ans=0.125 2024-08-10 02:25:04,872 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 02:25:06,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=318880.0, ans=0.125 2024-08-10 02:25:09,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=318880.0, ans=0.2 2024-08-10 02:25:14,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=318880.0, ans=0.95 2024-08-10 02:25:19,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+01 2.971e+01 3.564e+01 4.159e+01 7.122e+01, threshold=7.127e+01, percent-clipped=1.0 2024-08-10 02:25:23,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=318980.0, ans=0.125 2024-08-10 02:25:31,965 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.594e-03 2024-08-10 02:25:39,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-08-10 02:25:50,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=319180.0, ans=0.07 2024-08-10 02:26:07,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.52 vs. limit=6.0 2024-08-10 02:26:16,416 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 02:26:17,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 2950, loss[loss=0.1172, beats_loss=0.01343, ecapa_loss=0.0002736, whisper_loss=0.1011, over 22987.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.0124, ecapa_loss=0.0003006, whisper_loss=0.09947, over 3912182.66 frames. ], batch size: 91, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:26:17,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=319380.0, ans=0.09899494936611666 2024-08-10 02:26:38,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2024-08-10 02:26:48,056 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 02:27:07,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=319680.0, ans=0.125 2024-08-10 02:27:17,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=319780.0, ans=0.2 2024-08-10 02:27:23,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3000, loss[loss=0.1173, beats_loss=0.01176, ecapa_loss=0.0002776, whisper_loss=0.1028, over 19971.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01239, ecapa_loss=0.0003, whisper_loss=0.09995, over 3933976.46 frames. ], batch size: 78, lr: 2.00e-02, grad_scale: 1048576.0 2024-08-10 02:27:23,995 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 02:28:04,491 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on ASR_libri: loss=0.2772, beats_loss=0, ecapa_loss=0.0008938, whisper_loss=0.2682, over 922467.00 frames. 2024-08-10 02:28:22,933 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on SV_voxceleb1: loss=0.007832, beats_loss=0, ecapa_loss=0.0007832, whisper_loss=0, over 939242.00 frames. 2024-08-10 02:30:19,767 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on AT_audioset: loss=0.02861, beats_loss=0.02861, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 02:30:19,771 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 02:30:31,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=319880.0, ans=0.125 2024-08-10 02:30:32,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=319980.0, ans=0.2 2024-08-10 02:30:35,295 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-32000.pt 2024-08-10 02:30:38,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.871e+01 3.251e+01 3.853e+01 5.451e+01, threshold=6.502e+01, percent-clipped=0.0 2024-08-10 02:30:42,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=319980.0, ans=0.0 2024-08-10 02:30:44,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-08-10 02:30:46,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=319980.0, ans=0.025 2024-08-10 02:30:59,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320080.0, ans=0.1 2024-08-10 02:31:12,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320180.0, ans=0.1 2024-08-10 02:31:13,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=320180.0, ans=0.125 2024-08-10 02:31:27,062 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 02:31:30,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3050, loss[loss=0.09824, beats_loss=0.01644, ecapa_loss=0.0003283, whisper_loss=0.07851, over 19350.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01234, ecapa_loss=0.0002997, whisper_loss=0.0998, over 3902508.23 frames. ], batch size: 84, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:31:37,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=15.0 2024-08-10 02:31:46,487 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 02:32:00,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=12.0 2024-08-10 02:32:02,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=320580.0, ans=0.1 2024-08-10 02:32:17,805 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 02:32:40,001 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3100, loss[loss=0.1242, beats_loss=0.01147, ecapa_loss=0.0003252, whisper_loss=0.1094, over 22111.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.0123, ecapa_loss=0.0002994, whisper_loss=0.1006, over 3900464.33 frames. ], batch size: 90, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:32:56,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 2.934e+01 3.353e+01 3.892e+01 7.432e+01, threshold=6.707e+01, percent-clipped=2.0 2024-08-10 02:32:58,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=320980.0, ans=0.0 2024-08-10 02:33:01,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=15.0 2024-08-10 02:33:09,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321080.0, ans=0.0 2024-08-10 02:33:31,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=321180.0, ans=0.125 2024-08-10 02:33:48,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3150, loss[loss=0.1134, beats_loss=0.01169, ecapa_loss=0.0003876, whisper_loss=0.09784, over 21070.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01233, ecapa_loss=0.0003006, whisper_loss=0.1002, over 3888055.70 frames. ], batch size: 88, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:34:12,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321480.0, ans=0.1 2024-08-10 02:34:15,160 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 02:34:42,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2024-08-10 02:34:48,027 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 02:34:48,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=321780.0, ans=0.125 2024-08-10 02:34:52,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=321780.0, ans=0.025 2024-08-10 02:34:52,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=321780.0, ans=0.125 2024-08-10 02:34:53,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-10 02:34:54,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=321780.0, ans=0.1 2024-08-10 02:34:57,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3200, loss[loss=0.1254, beats_loss=0.01082, ecapa_loss=0.0002502, whisper_loss=0.1121, over 14991.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01231, ecapa_loss=0.0002995, whisper_loss=0.101, over 3889507.15 frames. ], batch size: 55, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:35:04,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=321880.0, ans=0.0 2024-08-10 02:35:08,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321880.0, ans=0.125 2024-08-10 02:35:13,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.260e+01 2.789e+01 3.261e+01 3.853e+01 5.155e+01, threshold=6.521e+01, percent-clipped=0.0 2024-08-10 02:35:13,411 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-10 02:35:16,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=321980.0, ans=0.0 2024-08-10 02:35:20,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=321980.0, ans=0.0 2024-08-10 02:35:20,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=321980.0, ans=15.0 2024-08-10 02:35:22,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=321980.0, ans=0.0 2024-08-10 02:35:34,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=322080.0, ans=0.125 2024-08-10 02:35:36,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=15.0 2024-08-10 02:36:03,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=322280.0, ans=0.2 2024-08-10 02:36:06,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3250, loss[loss=0.1002, beats_loss=0.01499, ecapa_loss=0.0002874, whisper_loss=0.08236, over 22604.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01222, ecapa_loss=0.0003007, whisper_loss=0.1017, over 3908749.30 frames. ], batch size: 92, lr: 2.00e-02, grad_scale: 2097152.0 2024-08-10 02:36:08,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=322380.0, ans=0.125 2024-08-10 02:36:30,280 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 02:36:41,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=322580.0, ans=0.125 2024-08-10 02:36:41,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-08-10 02:37:15,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3300, loss[loss=0.1442, beats_loss=0.009292, ecapa_loss=0.0003557, whisper_loss=0.1314, over 19760.00 frames. ], tot_loss[loss=0.1165, beats_loss=0.01221, ecapa_loss=0.0002989, whisper_loss=0.1013, over 3911153.79 frames. ], batch size: 76, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:37:25,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322880.0, ans=0.1 2024-08-10 02:37:26,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=322880.0, ans=0.125 2024-08-10 02:37:31,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.021e+01 3.431e+01 4.015e+01 7.071e+01, threshold=6.862e+01, percent-clipped=2.0 2024-08-10 02:37:31,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=322980.0, ans=0.0 2024-08-10 02:37:49,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=323080.0, ans=0.0 2024-08-10 02:37:55,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=323180.0, ans=0.125 2024-08-10 02:38:10,731 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 27 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 02:38:20,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-10 02:38:23,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3350, loss[loss=0.09748, beats_loss=0.0146, ecapa_loss=0.0002863, whisper_loss=0.08002, over 17506.00 frames. ], tot_loss[loss=0.1169, beats_loss=0.01217, ecapa_loss=0.0002979, whisper_loss=0.1017, over 3923026.81 frames. ], batch size: 72, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:38:28,244 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 02:38:31,394 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.935e-01 2024-08-10 02:38:37,782 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 02:38:43,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=323480.0, ans=0.0 2024-08-10 02:38:52,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=323580.0, ans=0.0 2024-08-10 02:38:58,648 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 02:39:05,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=323680.0, ans=0.2 2024-08-10 02:39:09,762 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 02:39:10,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=323680.0, ans=0.0 2024-08-10 02:39:12,496 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 02:39:18,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=323780.0, ans=0.125 2024-08-10 02:39:20,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2024-08-10 02:39:21,783 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 30 from Vox, 23 fro AS 2024-08-10 02:39:31,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3400, loss[loss=0.1373, beats_loss=0.009657, ecapa_loss=0.000319, whisper_loss=0.1245, over 23765.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01215, ecapa_loss=0.0002999, whisper_loss=0.1008, over 3910743.01 frames. ], batch size: 92, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:39:31,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=323880.0, ans=0.125 2024-08-10 02:39:47,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+01 2.812e+01 3.293e+01 3.899e+01 6.283e+01, threshold=6.585e+01, percent-clipped=0.0 2024-08-10 02:39:52,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2024-08-10 02:39:58,807 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 02:40:14,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=324180.0, ans=0.2 2024-08-10 02:40:14,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-08-10 02:40:22,173 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 02:40:27,442 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 02:40:37,066 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 02:40:39,478 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3450, loss[loss=0.0839, beats_loss=0.01573, ecapa_loss=0.0003045, whisper_loss=0.06512, over 22174.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01224, ecapa_loss=0.0002994, whisper_loss=0.09923, over 3907805.69 frames. ], batch size: 96, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:40:41,001 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 02:40:41,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=324380.0, ans=0.125 2024-08-10 02:40:48,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=324380.0, ans=0.2 2024-08-10 02:40:53,828 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 02:41:04,930 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 02:41:17,635 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 02:41:30,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=22.5 2024-08-10 02:41:31,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=324680.0, ans=0.125 2024-08-10 02:41:47,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=324880.0, ans=0.0 2024-08-10 02:41:48,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3500, loss[loss=0.1126, beats_loss=0.01243, ecapa_loss=0.0003226, whisper_loss=0.09697, over 22389.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01234, ecapa_loss=0.000299, whisper_loss=0.0994, over 3923504.70 frames. ], batch size: 93, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:41:54,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=324880.0, ans=0.125 2024-08-10 02:42:05,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 3.058e+01 3.643e+01 4.338e+01 7.554e+01, threshold=7.285e+01, percent-clipped=1.0 2024-08-10 02:42:05,505 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 02:42:06,862 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 02:42:08,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=324980.0, ans=0.0 2024-08-10 02:42:19,364 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.101e-01 2024-08-10 02:42:53,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=325280.0, ans=0.125 2024-08-10 02:42:57,750 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3550, loss[loss=0.1368, beats_loss=0.01106, ecapa_loss=0.0003109, whisper_loss=0.1226, over 22514.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01239, ecapa_loss=0.0002967, whisper_loss=0.09937, over 3932960.75 frames. ], batch size: 91, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:42:59,296 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 02:43:10,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=325480.0, ans=0.125 2024-08-10 02:43:20,510 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 02:43:26,278 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.978e-01 2024-08-10 02:43:44,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325680.0, ans=0.1 2024-08-10 02:43:50,881 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 02:44:07,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3600, loss[loss=0.1154, beats_loss=0.01351, ecapa_loss=0.0002291, whisper_loss=0.09965, over 18431.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01239, ecapa_loss=0.0002967, whisper_loss=0.09956, over 3930052.44 frames. ], batch size: 70, lr: 1.99e-02, grad_scale: 2097152.0 2024-08-10 02:44:13,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=325880.0, ans=0.035 2024-08-10 02:44:23,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.942e+01 3.351e+01 3.815e+01 6.062e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 02:44:35,331 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 02:44:42,212 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 31 from LS+wenet, 7 from Vox, 21 fro AS 2024-08-10 02:44:47,901 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 02:45:10,114 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 02:45:17,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3650, loss[loss=0.09885, beats_loss=0.01297, ecapa_loss=0.0002956, whisper_loss=0.08293, over 18600.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01231, ecapa_loss=0.0002966, whisper_loss=0.1003, over 3906180.68 frames. ], batch size: 75, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:45:48,862 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 02:45:53,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-08-10 02:45:55,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=326580.0, ans=0.07 2024-08-10 02:45:58,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.42 vs. limit=10.0 2024-08-10 02:46:01,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2024-08-10 02:46:03,649 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 02:46:04,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326680.0, ans=0.125 2024-08-10 02:46:15,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326780.0, ans=0.1 2024-08-10 02:46:16,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=326780.0, ans=0.025 2024-08-10 02:46:25,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3700, loss[loss=0.1104, beats_loss=0.01325, ecapa_loss=0.0003103, whisper_loss=0.09407, over 18491.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01237, ecapa_loss=0.0002951, whisper_loss=0.09993, over 3905254.11 frames. ], batch size: 75, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:46:26,008 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 02:46:42,270 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.947e+01 3.360e+01 4.039e+01 7.794e+01, threshold=6.721e+01, percent-clipped=1.0 2024-08-10 02:47:00,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=327080.0, ans=0.0 2024-08-10 02:47:01,817 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 19 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 02:47:03,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-10 02:47:18,198 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 26 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-10 02:47:22,031 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 02:47:23,549 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 02:47:33,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=327380.0, ans=0.125 2024-08-10 02:47:33,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3750, loss[loss=0.1085, beats_loss=0.01338, ecapa_loss=0.0003269, whisper_loss=0.09184, over 21731.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01237, ecapa_loss=0.0002965, whisper_loss=0.09974, over 3891710.19 frames. ], batch size: 91, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:47:37,109 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 02:47:40,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.79 vs. limit=22.5 2024-08-10 02:47:41,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=327380.0, ans=0.125 2024-08-10 02:47:54,561 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 02:47:57,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=327480.0, ans=0.2 2024-08-10 02:48:03,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=327580.0, ans=0.0 2024-08-10 02:48:14,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=327680.0, ans=0.125 2024-08-10 02:48:30,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=327780.0, ans=0.125 2024-08-10 02:48:32,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=327780.0, ans=0.125 2024-08-10 02:48:42,491 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3800, loss[loss=0.1255, beats_loss=0.01324, ecapa_loss=0.0003802, whisper_loss=0.1084, over 20907.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.0125, ecapa_loss=0.0002965, whisper_loss=0.099, over 3911259.80 frames. ], batch size: 92, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:48:58,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.560e+01 3.072e+01 3.520e+01 3.991e+01 6.360e+01, threshold=7.040e+01, percent-clipped=0.0 2024-08-10 02:49:15,836 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 02:49:38,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=328280.0, ans=0.0 2024-08-10 02:49:42,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=328280.0, ans=0.125 2024-08-10 02:49:51,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3850, loss[loss=0.09871, beats_loss=0.01399, ecapa_loss=0.0002391, whisper_loss=0.08233, over 14483.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01248, ecapa_loss=0.0002965, whisper_loss=0.09929, over 3883804.01 frames. ], batch size: 55, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:49:52,188 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 27 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-10 02:50:10,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=328480.0, ans=0.0 2024-08-10 02:50:14,105 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 02:50:19,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=328580.0, ans=0.125 2024-08-10 02:50:27,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=328580.0, ans=0.07 2024-08-10 02:50:40,898 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 02:50:51,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-10 02:50:59,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3900, loss[loss=0.114, beats_loss=0.01522, ecapa_loss=0.0002761, whisper_loss=0.09601, over 18092.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01245, ecapa_loss=0.0002982, whisper_loss=0.09972, over 3879595.63 frames. ], batch size: 72, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:51:07,915 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.412e+00 2024-08-10 02:51:15,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 3.142e+01 3.558e+01 4.007e+01 5.949e+01, threshold=7.115e+01, percent-clipped=0.0 2024-08-10 02:51:39,076 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 02:52:06,294 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-08-10 02:52:06,829 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 3950, loss[loss=0.1041, beats_loss=0.01298, ecapa_loss=0.0003552, whisper_loss=0.08761, over 20244.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01231, ecapa_loss=0.0003017, whisper_loss=0.1006, over 3870287.48 frames. ], batch size: 86, lr: 1.98e-02, grad_scale: 2097152.0 2024-08-10 02:52:22,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329480.0, ans=0.1 2024-08-10 02:52:28,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=329480.0, ans=0.125 2024-08-10 02:52:36,613 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-10 02:52:40,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=329580.0, ans=0.125 2024-08-10 02:52:43,278 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-10 02:53:13,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4000, loss[loss=0.1257, beats_loss=0.01195, ecapa_loss=0.0002787, whisper_loss=0.111, over 23134.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01231, ecapa_loss=0.0003007, whisper_loss=0.1011, over 3895744.15 frames. ], batch size: 94, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:53:17,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329880.0, ans=0.1 2024-08-10 02:53:26,266 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 02:53:26,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=329980.0, ans=0.125 2024-08-10 02:53:28,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329980.0, ans=0.1 2024-08-10 02:53:30,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 3.066e+01 3.430e+01 3.923e+01 5.367e+01, threshold=6.859e+01, percent-clipped=0.0 2024-08-10 02:53:56,341 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 02:54:02,998 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 02:54:07,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.98 vs. limit=22.5 2024-08-10 02:54:21,823 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4050, loss[loss=0.1219, beats_loss=0.00952, ecapa_loss=0.0003282, whisper_loss=0.1091, over 20078.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01222, ecapa_loss=0.0002996, whisper_loss=0.101, over 3898863.99 frames. ], batch size: 78, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:54:23,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330380.0, ans=0.1 2024-08-10 02:54:26,159 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 02:54:32,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-10 02:54:36,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=330480.0, ans=0.0 2024-08-10 02:54:46,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=330480.0, ans=0.125 2024-08-10 02:55:04,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=330680.0, ans=0.125 2024-08-10 02:55:11,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=330680.0, ans=0.125 2024-08-10 02:55:12,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330680.0, ans=0.125 2024-08-10 02:55:13,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=330680.0, ans=0.125 2024-08-10 02:55:22,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=330780.0, ans=0.125 2024-08-10 02:55:28,886 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4100, loss[loss=0.1106, beats_loss=0.01394, ecapa_loss=0.0002791, whisper_loss=0.09391, over 23124.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01227, ecapa_loss=0.0002992, whisper_loss=0.1007, over 3880833.14 frames. ], batch size: 95, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:55:31,846 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 02:55:37,386 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-10 02:55:38,704 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 02:55:38,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=330880.0, ans=0.2 2024-08-10 02:55:43,901 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 02:55:44,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.505e+01 2.941e+01 3.171e+01 3.928e+01 6.026e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 02:55:51,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.75 vs. limit=22.5 2024-08-10 02:56:11,079 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.258e+00 2024-08-10 02:56:29,403 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 02:56:35,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4150, loss[loss=0.1105, beats_loss=0.0115, ecapa_loss=0.000306, whisper_loss=0.09599, over 15410.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.0123, ecapa_loss=0.0002996, whisper_loss=0.1, over 3893815.61 frames. ], batch size: 62, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:56:36,158 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 02:56:42,763 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 02:56:44,110 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 02:56:46,839 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 02:56:58,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=331480.0, ans=0.125 2024-08-10 02:57:00,397 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 02:57:01,700 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 02:57:41,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=331880.0, ans=0.125 2024-08-10 02:57:42,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4200, loss[loss=0.128, beats_loss=0.01207, ecapa_loss=0.0002365, whisper_loss=0.1135, over 21388.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01233, ecapa_loss=0.0002988, whisper_loss=0.0997, over 3897683.78 frames. ], batch size: 82, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:57:58,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.978e+01 3.511e+01 4.145e+01 7.481e+01, threshold=7.022e+01, percent-clipped=3.0 2024-08-10 02:58:08,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=332080.0, ans=0.125 2024-08-10 02:58:36,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-10 02:58:47,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2024-08-10 02:58:50,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4250, loss[loss=0.09836, beats_loss=0.01189, ecapa_loss=0.0002905, whisper_loss=0.08357, over 17148.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01237, ecapa_loss=0.0002966, whisper_loss=0.0994, over 3912530.91 frames. ], batch size: 68, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 02:58:56,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=332380.0, ans=0.125 2024-08-10 02:59:17,372 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 36 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-10 02:59:23,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-10 02:59:25,520 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-10 02:59:44,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332780.0, ans=0.1 2024-08-10 02:59:59,468 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4300, loss[loss=0.1244, beats_loss=0.01322, ecapa_loss=0.0003019, whisper_loss=0.1081, over 22009.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01228, ecapa_loss=0.0002948, whisper_loss=0.09997, over 3916850.17 frames. ], batch size: 90, lr: 1.97e-02, grad_scale: 2097152.0 2024-08-10 03:00:06,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332880.0, ans=0.1 2024-08-10 03:00:15,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.965e+01 3.329e+01 3.837e+01 6.258e+01, threshold=6.658e+01, percent-clipped=0.0 2024-08-10 03:00:27,869 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 03:00:57,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=333280.0, ans=0.125 2024-08-10 03:01:04,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=333280.0, ans=0.0 2024-08-10 03:01:06,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4350, loss[loss=0.1104, beats_loss=0.01051, ecapa_loss=0.0003241, whisper_loss=0.09665, over 16879.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01222, ecapa_loss=0.0002968, whisper_loss=0.1, over 3913422.64 frames. ], batch size: 69, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:01:12,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333380.0, ans=0.1 2024-08-10 03:01:42,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=333580.0, ans=0.125 2024-08-10 03:01:51,548 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 03:02:13,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4400, loss[loss=0.08388, beats_loss=0.01419, ecapa_loss=0.0002778, whisper_loss=0.06691, over 22135.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01228, ecapa_loss=0.0002955, whisper_loss=0.09934, over 3899974.58 frames. ], batch size: 90, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:02:14,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333880.0, ans=0.1 2024-08-10 03:02:21,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=333880.0, ans=0.125 2024-08-10 03:02:21,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333880.0, ans=0.1 2024-08-10 03:02:30,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 2.919e+01 3.248e+01 3.746e+01 6.587e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 03:03:16,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=334280.0, ans=0.2 2024-08-10 03:03:22,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4450, loss[loss=0.1216, beats_loss=0.01565, ecapa_loss=0.0002819, whisper_loss=0.1031, over 22348.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01228, ecapa_loss=0.0002963, whisper_loss=0.09991, over 3909602.97 frames. ], batch size: 90, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:03:22,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=334380.0, ans=0.125 2024-08-10 03:03:47,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=334480.0, ans=0.0 2024-08-10 03:03:48,199 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 03:04:02,749 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 03:04:15,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=334680.0, ans=0.1 2024-08-10 03:04:20,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=334780.0, ans=0.02 2024-08-10 03:04:26,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=334780.0, ans=0.125 2024-08-10 03:04:33,836 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 03:04:35,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4500, loss[loss=0.1186, beats_loss=0.01439, ecapa_loss=0.0002634, whisper_loss=0.1016, over 23722.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01233, ecapa_loss=0.0002969, whisper_loss=0.09924, over 3923072.38 frames. ], batch size: 93, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:04:52,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+01 3.021e+01 3.497e+01 4.022e+01 7.846e+01, threshold=6.995e+01, percent-clipped=4.0 2024-08-10 03:05:02,535 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 03:05:03,877 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 03:05:11,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335080.0, ans=0.1 2024-08-10 03:05:29,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=335180.0, ans=0.125 2024-08-10 03:05:31,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=15.0 2024-08-10 03:05:46,514 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4550, loss[loss=0.1251, beats_loss=0.01071, ecapa_loss=0.0003521, whisper_loss=0.1109, over 17787.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01227, ecapa_loss=0.0002985, whisper_loss=0.09955, over 3897983.96 frames. ], batch size: 74, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:05:48,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=335380.0, ans=0.125 2024-08-10 03:05:53,163 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 03:06:15,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=335580.0, ans=12.0 2024-08-10 03:06:30,889 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 03:06:54,528 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 03:06:54,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=335780.0, ans=0.0 2024-08-10 03:06:58,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4600, loss[loss=0.1069, beats_loss=0.01413, ecapa_loss=0.0002204, whisper_loss=0.09054, over 21514.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01225, ecapa_loss=0.0002995, whisper_loss=0.0993, over 3877813.13 frames. ], batch size: 82, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:07:15,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.246e+01 3.685e+01 4.349e+01 7.107e+01, threshold=7.370e+01, percent-clipped=1.0 2024-08-10 03:07:17,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=335980.0, ans=0.125 2024-08-10 03:07:20,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=335980.0, ans=0.0 2024-08-10 03:07:29,502 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 03:07:31,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=336080.0, ans=0.0 2024-08-10 03:07:34,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=336080.0, ans=0.0 2024-08-10 03:07:35,204 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 03:07:43,417 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 03:07:55,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=336280.0, ans=0.125 2024-08-10 03:07:59,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=336280.0, ans=0.05 2024-08-10 03:07:59,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.21 vs. limit=22.5 2024-08-10 03:08:01,766 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 03:08:10,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4650, loss[loss=0.1094, beats_loss=0.01487, ecapa_loss=0.0002938, whisper_loss=0.09157, over 20867.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01233, ecapa_loss=0.0002995, whisper_loss=0.09917, over 3890008.01 frames. ], batch size: 86, lr: 1.96e-02, grad_scale: 2097152.0 2024-08-10 03:08:12,124 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 03:08:14,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=336380.0, ans=0.2 2024-08-10 03:08:30,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=336480.0, ans=0.125 2024-08-10 03:08:39,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.97 vs. limit=15.0 2024-08-10 03:08:40,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=336580.0, ans=0.125 2024-08-10 03:08:50,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=336580.0, ans=0.125 2024-08-10 03:08:54,812 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 03:09:04,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=336680.0, ans=0.0 2024-08-10 03:09:22,040 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 03:09:23,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4700, loss[loss=0.1083, beats_loss=0.0119, ecapa_loss=0.000349, whisper_loss=0.0929, over 21365.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01241, ecapa_loss=0.0002955, whisper_loss=0.09908, over 3898736.63 frames. ], batch size: 92, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:09:23,277 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 03:09:31,903 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 03:09:40,388 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 3.034e+01 3.372e+01 4.313e+01 2.367e+02, threshold=6.744e+01, percent-clipped=2.0 2024-08-10 03:09:40,661 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 03:09:50,348 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-10 03:09:54,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337080.0, ans=0.0 2024-08-10 03:10:01,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=337080.0, ans=0.035 2024-08-10 03:10:11,381 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 31 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 03:10:16,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=12.0 2024-08-10 03:10:23,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=337280.0, ans=0.2 2024-08-10 03:10:34,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4750, loss[loss=0.1441, beats_loss=0.01259, ecapa_loss=0.0002824, whisper_loss=0.1286, over 22578.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01244, ecapa_loss=0.0002971, whisper_loss=0.09845, over 3881174.13 frames. ], batch size: 86, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:10:35,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=337380.0, ans=0.125 2024-08-10 03:10:42,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=337380.0, ans=0.125 2024-08-10 03:10:44,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.10 vs. limit=15.0 2024-08-10 03:10:59,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=337480.0, ans=0.125 2024-08-10 03:11:02,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=337580.0, ans=0.2 2024-08-10 03:11:14,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-10 03:11:15,376 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 03:11:17,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=337680.0, ans=0.0 2024-08-10 03:11:30,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=337680.0, ans=0.125 2024-08-10 03:11:47,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4800, loss[loss=0.13, beats_loss=0.01058, ecapa_loss=0.0003092, whisper_loss=0.1163, over 22680.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.0125, ecapa_loss=0.0002965, whisper_loss=0.09878, over 3906319.16 frames. ], batch size: 90, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:12:01,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=337980.0, ans=0.125 2024-08-10 03:12:03,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.006e+01 3.324e+01 3.733e+01 5.524e+01, threshold=6.647e+01, percent-clipped=0.0 2024-08-10 03:12:17,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-08-10 03:12:21,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=338080.0, ans=0.2 2024-08-10 03:12:23,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=338080.0, ans=0.0 2024-08-10 03:12:23,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2024-08-10 03:12:27,417 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 03:12:40,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=338180.0, ans=0.04949747468305833 2024-08-10 03:12:43,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=338280.0, ans=0.2 2024-08-10 03:12:44,533 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 03:12:47,486 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 03:12:58,625 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4850, loss[loss=0.1109, beats_loss=0.01012, ecapa_loss=0.0002798, whisper_loss=0.09799, over 24130.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01255, ecapa_loss=0.0002945, whisper_loss=0.09867, over 3903169.04 frames. ], batch size: 94, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:13:29,094 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 03:13:41,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=338680.0, ans=0.125 2024-08-10 03:13:43,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=338680.0, ans=0.125 2024-08-10 03:13:52,821 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-10 03:14:02,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=338780.0, ans=0.125 2024-08-10 03:14:04,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=338780.0, ans=0.035 2024-08-10 03:14:04,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=338780.0, ans=0.0 2024-08-10 03:14:05,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=338780.0, ans=0.125 2024-08-10 03:14:08,577 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.603e-01 2024-08-10 03:14:11,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4900, loss[loss=0.09419, beats_loss=0.01417, ecapa_loss=0.000268, whisper_loss=0.07734, over 13952.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01245, ecapa_loss=0.0002955, whisper_loss=0.09964, over 3896007.85 frames. ], batch size: 54, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:14:12,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=338880.0, ans=0.125 2024-08-10 03:14:24,641 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 03:14:28,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+01 3.105e+01 3.477e+01 3.938e+01 7.192e+01, threshold=6.955e+01, percent-clipped=1.0 2024-08-10 03:14:31,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=338980.0, ans=0.2 2024-08-10 03:14:38,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=339080.0, ans=0.1 2024-08-10 03:14:48,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=339080.0, ans=0.2 2024-08-10 03:14:49,536 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 03:15:22,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 4950, loss[loss=0.09395, beats_loss=0.0137, ecapa_loss=0.0002786, whisper_loss=0.07746, over 15735.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01241, ecapa_loss=0.0002955, whisper_loss=0.09947, over 3881354.26 frames. ], batch size: 64, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:15:35,991 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-10 03:15:40,452 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 03:15:45,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=339480.0, ans=0.125 2024-08-10 03:15:54,493 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-10 03:16:05,206 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 03:16:06,872 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-10 03:16:12,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=339680.0, ans=0.125 2024-08-10 03:16:16,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=339680.0, ans=0.125 2024-08-10 03:16:16,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2024-08-10 03:16:36,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5000, loss[loss=0.1001, beats_loss=0.01128, ecapa_loss=0.0003429, whisper_loss=0.08537, over 18275.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01236, ecapa_loss=0.0002963, whisper_loss=0.1, over 3855288.76 frames. ], batch size: 72, lr: 1.95e-02, grad_scale: 2097152.0 2024-08-10 03:16:43,855 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 03:16:51,101 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.758e+00 2024-08-10 03:16:53,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.921e+01 3.372e+01 3.826e+01 7.563e+01, threshold=6.744e+01, percent-clipped=1.0 2024-08-10 03:16:55,632 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 03:17:11,338 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 03:17:29,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=340180.0, ans=0.1 2024-08-10 03:17:32,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340180.0, ans=0.125 2024-08-10 03:17:48,498 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5050, loss[loss=0.1112, beats_loss=0.01428, ecapa_loss=0.0003055, whisper_loss=0.09382, over 22002.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01244, ecapa_loss=0.0002955, whisper_loss=0.1004, over 3875330.56 frames. ], batch size: 94, lr: 1.95e-02, grad_scale: 4194304.0 2024-08-10 03:17:55,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=340380.0, ans=0.0 2024-08-10 03:17:57,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=12.0 2024-08-10 03:18:11,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-10 03:18:19,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=340580.0, ans=0.125 2024-08-10 03:18:31,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=340680.0, ans=0.125 2024-08-10 03:18:31,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=340680.0, ans=0.2 2024-08-10 03:18:35,890 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-10 03:19:00,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=340880.0, ans=0.125 2024-08-10 03:19:01,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5100, loss[loss=0.1167, beats_loss=0.009539, ecapa_loss=0.0002981, whisper_loss=0.1042, over 14134.00 frames. ], tot_loss[loss=0.1166, beats_loss=0.01224, ecapa_loss=0.0002953, whisper_loss=0.1014, over 3871243.09 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:19:19,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.974e+01 3.405e+01 3.841e+01 8.729e+01, threshold=6.810e+01, percent-clipped=2.0 2024-08-10 03:19:24,333 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 31 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-10 03:19:36,858 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-10 03:19:55,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=341180.0, ans=0.1 2024-08-10 03:20:00,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=341180.0, ans=0.0 2024-08-10 03:20:04,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2024-08-10 03:20:17,341 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5150, loss[loss=0.1347, beats_loss=0.0131, ecapa_loss=0.000323, whisper_loss=0.1184, over 21505.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01228, ecapa_loss=0.0002917, whisper_loss=0.1012, over 3904670.08 frames. ], batch size: 88, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:20:33,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=341480.0, ans=0.125 2024-08-10 03:20:36,240 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 03:20:39,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=341480.0, ans=0.1 2024-08-10 03:21:12,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=341680.0, ans=0.2 2024-08-10 03:21:17,935 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-10 03:21:23,057 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 03:21:23,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=15.0 2024-08-10 03:21:24,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=341780.0, ans=0.125 2024-08-10 03:21:32,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5200, loss[loss=0.1042, beats_loss=0.01451, ecapa_loss=0.0002468, whisper_loss=0.08718, over 18214.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01226, ecapa_loss=0.000292, whisper_loss=0.101, over 3904939.47 frames. ], batch size: 72, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:21:33,259 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 03:21:36,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=341880.0, ans=0.125 2024-08-10 03:21:42,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=341880.0, ans=0.1 2024-08-10 03:21:43,610 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 03:21:51,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.930e+01 3.270e+01 3.996e+01 6.105e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-10 03:22:02,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-08-10 03:22:03,158 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-10 03:22:25,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=342180.0, ans=0.0 2024-08-10 03:22:25,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=342180.0, ans=0.125 2024-08-10 03:22:46,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5250, loss[loss=0.1228, beats_loss=0.01227, ecapa_loss=0.0003019, whisper_loss=0.1076, over 18300.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01229, ecapa_loss=0.0002931, whisper_loss=0.1007, over 3907355.26 frames. ], batch size: 73, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:22:48,431 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.658e+03 2024-08-10 03:22:58,060 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 03:22:58,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=342380.0, ans=0.0 2024-08-10 03:23:02,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=342480.0, ans=0.125 2024-08-10 03:23:09,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=342480.0, ans=0.0 2024-08-10 03:23:13,706 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 19 from LS+wenet, 24 from Vox, 52 fro AS 2024-08-10 03:23:19,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=342580.0, ans=0.125 2024-08-10 03:23:32,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342680.0, ans=0.1 2024-08-10 03:23:33,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=342680.0, ans=0.0 2024-08-10 03:23:36,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=342680.0, ans=0.125 2024-08-10 03:23:42,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=342680.0, ans=0.0 2024-08-10 03:23:49,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=342780.0, ans=0.0 2024-08-10 03:24:02,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5300, loss[loss=0.135, beats_loss=0.01011, ecapa_loss=0.0002496, whisper_loss=0.1224, over 18481.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01229, ecapa_loss=0.0002918, whisper_loss=0.1009, over 3921163.60 frames. ], batch size: 70, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:24:16,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=342980.0, ans=0.125 2024-08-10 03:24:19,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=342980.0, ans=0.0 2024-08-10 03:24:20,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.247e+01 2.874e+01 3.315e+01 3.923e+01 7.752e+01, threshold=6.630e+01, percent-clipped=2.0 2024-08-10 03:24:20,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=342980.0, ans=0.125 2024-08-10 03:24:24,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=342980.0, ans=0.125 2024-08-10 03:24:28,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=342980.0, ans=0.0 2024-08-10 03:24:36,023 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 03:24:50,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=343180.0, ans=0.025 2024-08-10 03:25:02,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.99 vs. limit=15.0 2024-08-10 03:25:10,072 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 14 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 03:25:15,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5350, loss[loss=0.0893, beats_loss=0.01143, ecapa_loss=0.0003126, whisper_loss=0.07475, over 14487.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01227, ecapa_loss=0.0002905, whisper_loss=0.1003, over 3898497.87 frames. ], batch size: 59, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:25:30,791 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-10 03:25:32,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=343480.0, ans=0.025 2024-08-10 03:25:32,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-10 03:25:43,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=343580.0, ans=0.0 2024-08-10 03:25:45,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=343580.0, ans=0.0 2024-08-10 03:25:48,072 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-10 03:25:48,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=343580.0, ans=0.2 2024-08-10 03:25:59,191 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 03:26:06,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343680.0, ans=0.1 2024-08-10 03:26:31,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=343880.0, ans=0.125 2024-08-10 03:26:32,420 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5400, loss[loss=0.1134, beats_loss=0.01104, ecapa_loss=0.0003219, whisper_loss=0.09916, over 21882.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01221, ecapa_loss=0.0002898, whisper_loss=0.1006, over 3900721.59 frames. ], batch size: 89, lr: 1.94e-02, grad_scale: 4194304.0 2024-08-10 03:26:49,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=343980.0, ans=0.125 2024-08-10 03:26:50,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.952e+01 3.404e+01 3.987e+01 5.856e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 03:26:55,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=343980.0, ans=0.125 2024-08-10 03:26:57,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=343980.0, ans=0.0 2024-08-10 03:27:16,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=344180.0, ans=0.0 2024-08-10 03:27:26,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=344180.0, ans=0.125 2024-08-10 03:27:46,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5450, loss[loss=0.09572, beats_loss=0.01469, ecapa_loss=0.0002928, whisper_loss=0.0781, over 21175.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01229, ecapa_loss=0.0002906, whisper_loss=0.09968, over 3876874.20 frames. ], batch size: 91, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:27:50,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.90 vs. limit=22.5 2024-08-10 03:27:58,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=344380.0, ans=0.125 2024-08-10 03:28:12,017 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 03:28:23,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=344580.0, ans=0.125 2024-08-10 03:28:48,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=344780.0, ans=0.125 2024-08-10 03:28:56,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344780.0, ans=0.125 2024-08-10 03:29:03,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5500, loss[loss=0.1207, beats_loss=0.01263, ecapa_loss=0.0003177, whisper_loss=0.1049, over 21465.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01223, ecapa_loss=0.000293, whisper_loss=0.1002, over 3889266.65 frames. ], batch size: 90, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:29:22,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.947e+01 3.297e+01 3.879e+01 5.625e+01, threshold=6.594e+01, percent-clipped=0.0 2024-08-10 03:29:32,919 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-10 03:29:46,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=345080.0, ans=0.125 2024-08-10 03:29:52,258 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 03:30:13,302 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 03:30:19,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5550, loss[loss=0.1174, beats_loss=0.01087, ecapa_loss=0.0003691, whisper_loss=0.1028, over 17438.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01226, ecapa_loss=0.0002923, whisper_loss=0.1003, over 3875301.87 frames. ], batch size: 73, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:30:20,749 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 03:30:33,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2024-08-10 03:30:46,303 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 03:31:17,381 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 03:31:23,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=345780.0, ans=0.05 2024-08-10 03:31:35,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5600, loss[loss=0.1214, beats_loss=0.01164, ecapa_loss=0.0003244, whisper_loss=0.1065, over 21024.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01225, ecapa_loss=0.0002939, whisper_loss=0.1001, over 3892427.93 frames. ], batch size: 83, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:31:38,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.28 vs. limit=22.5 2024-08-10 03:31:48,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=345880.0, ans=0.125 2024-08-10 03:31:53,497 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 2.950e+01 3.327e+01 3.865e+01 5.194e+01, threshold=6.655e+01, percent-clipped=0.0 2024-08-10 03:31:53,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=345980.0, ans=0.0 2024-08-10 03:32:05,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=346080.0, ans=0.2 2024-08-10 03:32:20,217 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 03:32:20,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=346180.0, ans=0.125 2024-08-10 03:32:27,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346180.0, ans=0.0 2024-08-10 03:32:30,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=346180.0, ans=0.0 2024-08-10 03:32:32,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=346280.0, ans=0.125 2024-08-10 03:32:37,740 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 03:32:49,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5650, loss[loss=0.1285, beats_loss=0.01256, ecapa_loss=0.000336, whisper_loss=0.1126, over 13939.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01238, ecapa_loss=0.0002924, whisper_loss=0.09984, over 3881621.69 frames. ], batch size: 54, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:33:15,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=346480.0, ans=0.0 2024-08-10 03:33:23,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=346580.0, ans=0.0 2024-08-10 03:33:23,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-08-10 03:33:35,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.94 vs. limit=15.0 2024-08-10 03:33:53,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=346780.0, ans=0.125 2024-08-10 03:34:01,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=346780.0, ans=0.0 2024-08-10 03:34:05,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5700, loss[loss=0.1094, beats_loss=0.01458, ecapa_loss=0.0002935, whisper_loss=0.09186, over 22253.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01234, ecapa_loss=0.0002933, whisper_loss=0.1002, over 3926849.01 frames. ], batch size: 92, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:34:11,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=346880.0, ans=0.125 2024-08-10 03:34:11,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=346880.0, ans=0.125 2024-08-10 03:34:23,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+01 2.923e+01 3.363e+01 4.122e+01 7.176e+01, threshold=6.726e+01, percent-clipped=2.0 2024-08-10 03:34:36,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=347080.0, ans=0.125 2024-08-10 03:35:03,412 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 03:35:22,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5750, loss[loss=0.1419, beats_loss=0.00996, ecapa_loss=0.0003301, whisper_loss=0.1286, over 22538.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01229, ecapa_loss=0.0002947, whisper_loss=0.1006, over 3911269.60 frames. ], batch size: 86, lr: 1.93e-02, grad_scale: 4194304.0 2024-08-10 03:35:25,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=347380.0, ans=0.125 2024-08-10 03:35:25,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=347380.0, ans=0.07 2024-08-10 03:35:27,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=347380.0, ans=0.0 2024-08-10 03:35:32,252 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 03:35:41,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=347480.0, ans=0.025 2024-08-10 03:35:50,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347580.0, ans=0.1 2024-08-10 03:35:52,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=347580.0, ans=0.0 2024-08-10 03:35:53,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=347580.0, ans=10.0 2024-08-10 03:35:53,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=347580.0, ans=0.0 2024-08-10 03:36:05,684 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 29 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 03:36:05,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=347680.0, ans=0.125 2024-08-10 03:36:17,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=347680.0, ans=0.0 2024-08-10 03:36:36,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5800, loss[loss=0.0925, beats_loss=0.01446, ecapa_loss=0.000315, whisper_loss=0.07488, over 20371.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01229, ecapa_loss=0.000293, whisper_loss=0.1007, over 3870474.65 frames. ], batch size: 89, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:36:39,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=347880.0, ans=0.125 2024-08-10 03:36:42,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=347880.0, ans=0.0 2024-08-10 03:36:52,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=347980.0, ans=0.0 2024-08-10 03:36:53,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347980.0, ans=0.1 2024-08-10 03:36:54,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+01 2.824e+01 3.290e+01 3.735e+01 8.555e+01, threshold=6.581e+01, percent-clipped=2.0 2024-08-10 03:37:12,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=348080.0, ans=0.125 2024-08-10 03:37:13,305 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 03:37:19,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=348080.0, ans=0.2 2024-08-10 03:37:23,554 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 03:37:27,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=348180.0, ans=0.125 2024-08-10 03:37:33,627 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 03:37:38,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2024-08-10 03:37:50,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5850, loss[loss=0.1336, beats_loss=0.01239, ecapa_loss=0.000314, whisper_loss=0.118, over 20708.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01237, ecapa_loss=0.0002937, whisper_loss=0.1002, over 3887421.26 frames. ], batch size: 83, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:37:53,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-10 03:38:03,422 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-10 03:38:10,562 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 03:38:14,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=348480.0, ans=0.0 2024-08-10 03:38:18,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=348580.0, ans=0.02 2024-08-10 03:38:26,507 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 03:38:34,721 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 19 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-10 03:38:36,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=348680.0, ans=0.2 2024-08-10 03:38:39,039 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 03:38:40,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=348680.0, ans=0.0 2024-08-10 03:38:45,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=348680.0, ans=0.0 2024-08-10 03:39:00,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5900, loss[loss=0.08425, beats_loss=0.01224, ecapa_loss=0.0002864, whisper_loss=0.06915, over 16056.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01249, ecapa_loss=0.0002934, whisper_loss=0.09888, over 3861078.76 frames. ], batch size: 62, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:39:16,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 2.929e+01 3.311e+01 3.794e+01 5.610e+01, threshold=6.621e+01, percent-clipped=0.0 2024-08-10 03:39:22,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=348980.0, ans=0.0 2024-08-10 03:39:40,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=349180.0, ans=0.125 2024-08-10 03:39:43,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-10 03:39:47,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=349180.0, ans=0.05 2024-08-10 03:39:51,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=349180.0, ans=0.0 2024-08-10 03:40:09,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 5950, loss[loss=0.09931, beats_loss=0.01461, ecapa_loss=0.0002714, whisper_loss=0.08199, over 14823.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01253, ecapa_loss=0.000292, whisper_loss=0.09855, over 3869583.36 frames. ], batch size: 59, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:40:25,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-10 03:40:31,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2024-08-10 03:40:36,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=349580.0, ans=0.125 2024-08-10 03:40:47,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=349580.0, ans=0.125 2024-08-10 03:41:17,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=349880.0, ans=0.025 2024-08-10 03:41:18,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6000, loss[loss=0.1145, beats_loss=0.01403, ecapa_loss=0.0002591, whisper_loss=0.09784, over 15170.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01252, ecapa_loss=0.0002896, whisper_loss=0.0988, over 3866153.46 frames. ], batch size: 59, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:41:18,608 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 03:41:57,752 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on ASR_libri: loss=0.2761, beats_loss=0, ecapa_loss=0.0008742, whisper_loss=0.2674, over 922467.00 frames. 2024-08-10 03:42:15,750 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on SV_voxceleb1: loss=0.007667, beats_loss=0, ecapa_loss=0.0007667, whisper_loss=0, over 939242.00 frames. 2024-08-10 03:44:14,920 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on AT_audioset: loss=0.0285, beats_loss=0.0285, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 03:44:14,924 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 03:44:28,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=349980.0, ans=0.0 2024-08-10 03:44:32,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 3.043e+01 3.498e+01 4.267e+01 5.483e+01, threshold=6.996e+01, percent-clipped=0.0 2024-08-10 03:44:34,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=349980.0, ans=0.0 2024-08-10 03:44:38,190 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-10 03:44:45,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=350080.0, ans=0.125 2024-08-10 03:44:46,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=350080.0, ans=0.125 2024-08-10 03:44:48,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=350080.0, ans=0.125 2024-08-10 03:44:49,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=350080.0, ans=0.125 2024-08-10 03:45:02,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2024-08-10 03:45:09,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=350180.0, ans=10.0 2024-08-10 03:45:19,684 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 03:45:26,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6050, loss[loss=0.1438, beats_loss=0.01087, ecapa_loss=0.0002645, whisper_loss=0.1303, over 23170.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01248, ecapa_loss=0.0002896, whisper_loss=0.09907, over 3870747.53 frames. ], batch size: 88, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:45:43,617 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 29 from Vox, 19 fro AS 2024-08-10 03:45:45,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=350480.0, ans=0.0 2024-08-10 03:45:51,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=350480.0, ans=0.1 2024-08-10 03:46:36,529 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6100, loss[loss=0.08951, beats_loss=0.01855, ecapa_loss=0.000249, whisper_loss=0.06848, over 17887.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01249, ecapa_loss=0.000292, whisper_loss=0.09867, over 3882919.05 frames. ], batch size: 73, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:46:45,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=350880.0, ans=0.125 2024-08-10 03:46:53,151 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.971e+01 3.424e+01 4.102e+01 1.085e+02, threshold=6.848e+01, percent-clipped=1.0 2024-08-10 03:46:55,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.96 vs. limit=10.0 2024-08-10 03:47:06,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=351080.0, ans=0.125 2024-08-10 03:47:22,728 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 03:47:35,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=351280.0, ans=10.0 2024-08-10 03:47:45,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6150, loss[loss=0.08033, beats_loss=0.01406, ecapa_loss=0.0003296, whisper_loss=0.06297, over 17725.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01246, ecapa_loss=0.0002912, whisper_loss=0.09841, over 3913938.82 frames. ], batch size: 74, lr: 1.92e-02, grad_scale: 4194304.0 2024-08-10 03:47:57,676 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 03:48:01,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=351480.0, ans=0.125 2024-08-10 03:48:03,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.95 vs. limit=15.0 2024-08-10 03:48:10,015 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 03:48:34,800 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 03:48:40,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351780.0, ans=0.125 2024-08-10 03:48:40,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=351780.0, ans=0.2 2024-08-10 03:48:51,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=351780.0, ans=0.125 2024-08-10 03:48:54,937 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6200, loss[loss=0.13, beats_loss=0.01001, ecapa_loss=0.000312, whisper_loss=0.1169, over 23023.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.0124, ecapa_loss=0.0002909, whisper_loss=0.09923, over 3920513.77 frames. ], batch size: 90, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:48:58,140 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.487e-01 2024-08-10 03:49:04,572 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 03:49:11,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.480e+01 3.031e+01 3.409e+01 3.924e+01 5.999e+01, threshold=6.819e+01, percent-clipped=0.0 2024-08-10 03:49:20,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-08-10 03:49:22,289 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 03:49:51,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=352280.0, ans=0.125 2024-08-10 03:49:55,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.82 vs. limit=10.0 2024-08-10 03:50:02,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6250, loss[loss=0.1193, beats_loss=0.01392, ecapa_loss=0.0002437, whisper_loss=0.1029, over 16767.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01232, ecapa_loss=0.0002929, whisper_loss=0.09979, over 3890787.59 frames. ], batch size: 62, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:50:08,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=352380.0, ans=0.125 2024-08-10 03:50:10,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=352380.0, ans=0.125 2024-08-10 03:50:18,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=352480.0, ans=0.0 2024-08-10 03:50:25,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-10 03:50:37,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=352580.0, ans=0.0 2024-08-10 03:50:40,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352580.0, ans=0.1 2024-08-10 03:50:44,461 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 03:50:45,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=352680.0, ans=0.025 2024-08-10 03:50:49,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352680.0, ans=0.125 2024-08-10 03:50:52,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352680.0, ans=0.1 2024-08-10 03:50:54,708 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 03:50:55,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2024-08-10 03:51:05,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352780.0, ans=0.125 2024-08-10 03:51:10,694 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6300, loss[loss=0.1146, beats_loss=0.01511, ecapa_loss=0.0002855, whisper_loss=0.09661, over 21293.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01231, ecapa_loss=0.0002931, whisper_loss=0.09963, over 3860179.79 frames. ], batch size: 86, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:51:22,167 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 03:51:25,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=352980.0, ans=0.2 2024-08-10 03:51:27,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 3.057e+01 3.444e+01 4.179e+01 1.718e+02, threshold=6.888e+01, percent-clipped=1.0 2024-08-10 03:51:42,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=353080.0, ans=0.07 2024-08-10 03:51:43,927 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 03:52:06,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353280.0, ans=0.1 2024-08-10 03:52:07,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=353280.0, ans=0.025 2024-08-10 03:52:11,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=353280.0, ans=0.2 2024-08-10 03:52:19,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6350, loss[loss=0.1258, beats_loss=0.01111, ecapa_loss=0.0003398, whisper_loss=0.1112, over 17026.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01238, ecapa_loss=0.0002936, whisper_loss=0.09909, over 3835819.25 frames. ], batch size: 67, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:52:26,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=353380.0, ans=0.125 2024-08-10 03:52:36,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353480.0, ans=0.1 2024-08-10 03:52:57,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=353580.0, ans=0.0 2024-08-10 03:52:59,599 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 03:53:10,879 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-10 03:53:16,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=353780.0, ans=0.125 2024-08-10 03:53:18,288 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 33 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 03:53:28,574 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6400, loss[loss=0.08711, beats_loss=0.01417, ecapa_loss=0.0003344, whisper_loss=0.06959, over 16193.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01241, ecapa_loss=0.0002914, whisper_loss=0.09953, over 3862220.55 frames. ], batch size: 68, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:53:35,339 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 03:53:41,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-10 03:53:44,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.873e+01 3.233e+01 3.602e+01 5.742e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-10 03:53:53,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=353980.0, ans=0.125 2024-08-10 03:54:22,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=354280.0, ans=0.0 2024-08-10 03:54:33,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354280.0, ans=0.1 2024-08-10 03:54:37,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6450, loss[loss=0.1127, beats_loss=0.01205, ecapa_loss=0.0003228, whisper_loss=0.09742, over 17973.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01239, ecapa_loss=0.0002912, whisper_loss=0.09956, over 3862625.65 frames. ], batch size: 74, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:54:39,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=22.5 2024-08-10 03:55:00,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=354480.0, ans=0.125 2024-08-10 03:55:18,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=354680.0, ans=0.2 2024-08-10 03:55:38,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=354780.0, ans=0.125 2024-08-10 03:55:45,995 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6500, loss[loss=0.1117, beats_loss=0.01429, ecapa_loss=0.0002379, whisper_loss=0.09499, over 20678.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01243, ecapa_loss=0.0002883, whisper_loss=0.1003, over 3898361.42 frames. ], batch size: 83, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:55:46,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=354880.0, ans=0.125 2024-08-10 03:55:51,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=354880.0, ans=0.09899494936611666 2024-08-10 03:55:59,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=4.97 vs. limit=15.0 2024-08-10 03:56:02,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.914e+01 3.314e+01 3.758e+01 6.768e+01, threshold=6.629e+01, percent-clipped=1.0 2024-08-10 03:56:06,250 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-10 03:56:06,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=354980.0, ans=0.0 2024-08-10 03:56:12,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355080.0, ans=0.1 2024-08-10 03:56:25,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=355180.0, ans=0.0 2024-08-10 03:56:28,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.03 vs. limit=15.0 2024-08-10 03:56:33,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=355180.0, ans=0.2 2024-08-10 03:56:39,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=355280.0, ans=0.07 2024-08-10 03:56:40,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355280.0, ans=0.1 2024-08-10 03:56:53,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6550, loss[loss=0.1101, beats_loss=0.01243, ecapa_loss=0.0002944, whisper_loss=0.09471, over 22248.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01249, ecapa_loss=0.0002891, whisper_loss=0.1, over 3896946.17 frames. ], batch size: 88, lr: 1.91e-02, grad_scale: 4194304.0 2024-08-10 03:57:08,859 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 03:57:11,485 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 03:57:17,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=355480.0, ans=0.0 2024-08-10 03:57:31,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=355580.0, ans=0.0 2024-08-10 03:57:39,102 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 03:57:49,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-10 03:57:52,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=355780.0, ans=0.125 2024-08-10 03:57:54,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=355780.0, ans=0.0 2024-08-10 03:58:01,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6600, loss[loss=0.1214, beats_loss=0.01166, ecapa_loss=0.0002838, whisper_loss=0.1069, over 18615.00 frames. ], tot_loss[loss=0.1168, beats_loss=0.01237, ecapa_loss=0.0002906, whisper_loss=0.1015, over 3922253.09 frames. ], batch size: 73, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:58:03,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2024-08-10 03:58:04,338 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 03:58:05,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2024-08-10 03:58:10,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=355880.0, ans=0.125 2024-08-10 03:58:18,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.136e+01 3.510e+01 4.053e+01 6.821e+01, threshold=7.019e+01, percent-clipped=1.0 2024-08-10 03:58:33,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=356080.0, ans=0.125 2024-08-10 03:58:44,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=356180.0, ans=0.125 2024-08-10 03:58:47,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=356180.0, ans=0.0 2024-08-10 03:58:55,669 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 03:59:02,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=356280.0, ans=0.125 2024-08-10 03:59:05,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=356280.0, ans=0.025 2024-08-10 03:59:10,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6650, loss[loss=0.09729, beats_loss=0.01385, ecapa_loss=0.0002728, whisper_loss=0.08071, over 19164.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01237, ecapa_loss=0.0002911, whisper_loss=0.101, over 3927309.21 frames. ], batch size: 76, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 03:59:13,508 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 03:59:16,494 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 03:59:22,985 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 03:59:35,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356480.0, ans=0.125 2024-08-10 03:59:37,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=356580.0, ans=0.025 2024-08-10 03:59:39,753 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-10 04:00:04,013 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 34 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 04:00:09,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=356780.0, ans=0.0 2024-08-10 04:00:14,091 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 04:00:19,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6700, loss[loss=0.09311, beats_loss=0.01454, ecapa_loss=0.0002883, whisper_loss=0.07569, over 19374.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01231, ecapa_loss=0.000292, whisper_loss=0.1007, over 3908156.40 frames. ], batch size: 80, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:00:32,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356980.0, ans=0.1 2024-08-10 04:00:35,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.886e+01 3.265e+01 3.693e+01 7.385e+01, threshold=6.529e+01, percent-clipped=1.0 2024-08-10 04:00:38,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=356980.0, ans=0.125 2024-08-10 04:00:40,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=356980.0, ans=0.125 2024-08-10 04:00:55,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-08-10 04:01:05,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=357180.0, ans=0.125 2024-08-10 04:01:07,898 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 04:01:14,809 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 04:01:28,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6750, loss[loss=0.1134, beats_loss=0.01179, ecapa_loss=0.0003445, whisper_loss=0.0982, over 21586.00 frames. ], tot_loss[loss=0.1164, beats_loss=0.01222, ecapa_loss=0.0002923, whisper_loss=0.1013, over 3900585.77 frames. ], batch size: 89, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:01:49,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=357480.0, ans=0.05 2024-08-10 04:02:05,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.65 vs. limit=22.5 2024-08-10 04:02:20,710 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-10 04:02:27,728 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 14 from Vox, 54 fro AS 2024-08-10 04:02:36,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=357880.0, ans=0.125 2024-08-10 04:02:36,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.74 vs. limit=22.5 2024-08-10 04:02:37,207 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6800, loss[loss=0.1153, beats_loss=0.01188, ecapa_loss=0.0003261, whisper_loss=0.1001, over 22153.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01227, ecapa_loss=0.0002943, whisper_loss=0.1008, over 3854699.07 frames. ], batch size: 89, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:02:37,358 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 04:02:47,048 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 04:02:48,730 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 04:02:52,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.84 vs. limit=22.5 2024-08-10 04:02:54,011 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.970e+01 3.321e+01 3.801e+01 1.301e+02, threshold=6.643e+01, percent-clipped=3.0 2024-08-10 04:02:55,560 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 04:03:12,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=358080.0, ans=0.125 2024-08-10 04:03:16,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358080.0, ans=0.125 2024-08-10 04:03:17,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-10 04:03:20,503 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-10 04:03:25,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=358180.0, ans=0.0 2024-08-10 04:03:26,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=358180.0, ans=0.125 2024-08-10 04:03:40,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=358280.0, ans=0.125 2024-08-10 04:03:43,654 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 04:03:46,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6850, loss[loss=0.1326, beats_loss=0.0109, ecapa_loss=0.0002807, whisper_loss=0.1189, over 23278.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01217, ecapa_loss=0.0002945, whisper_loss=0.1011, over 3867367.14 frames. ], batch size: 90, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:03:57,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-08-10 04:04:03,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=358480.0, ans=0.0 2024-08-10 04:04:14,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-10 04:04:45,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=358780.0, ans=0.125 2024-08-10 04:04:54,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6900, loss[loss=0.1337, beats_loss=0.01095, ecapa_loss=0.0003284, whisper_loss=0.1195, over 14449.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.0122, ecapa_loss=0.0002943, whisper_loss=0.1009, over 3882337.12 frames. ], batch size: 57, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:05:10,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.37 vs. limit=15.0 2024-08-10 04:05:10,686 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.982e+01 3.330e+01 3.890e+01 5.660e+01, threshold=6.660e+01, percent-clipped=0.0 2024-08-10 04:06:03,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 6950, loss[loss=0.1265, beats_loss=0.008556, ecapa_loss=0.0003164, whisper_loss=0.1148, over 16644.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01228, ecapa_loss=0.0002932, whisper_loss=0.1005, over 3888592.66 frames. ], batch size: 65, lr: 1.90e-02, grad_scale: 4194304.0 2024-08-10 04:06:06,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=359380.0, ans=0.2 2024-08-10 04:06:21,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=359480.0, ans=0.125 2024-08-10 04:06:26,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=359480.0, ans=0.125 2024-08-10 04:06:38,606 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-10 04:06:56,771 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 04:07:13,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7000, loss[loss=0.09494, beats_loss=0.01327, ecapa_loss=0.0003128, whisper_loss=0.07854, over 21010.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01225, ecapa_loss=0.0002937, whisper_loss=0.0996, over 3876946.69 frames. ], batch size: 84, lr: 1.89e-02, grad_scale: 4194304.0 2024-08-10 04:07:14,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=359880.0, ans=0.0 2024-08-10 04:07:28,765 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-36000.pt 2024-08-10 04:07:32,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.852e+01 3.263e+01 3.844e+01 5.295e+01, threshold=6.525e+01, percent-clipped=0.0 2024-08-10 04:07:48,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=360080.0, ans=0.125 2024-08-10 04:07:57,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=360180.0, ans=0.125 2024-08-10 04:08:01,809 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 04:08:20,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=360280.0, ans=0.2 2024-08-10 04:08:21,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=360280.0, ans=0.05 2024-08-10 04:08:23,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=360380.0, ans=0.125 2024-08-10 04:08:24,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7050, loss[loss=0.128, beats_loss=0.01043, ecapa_loss=0.0002746, whisper_loss=0.1148, over 19010.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01229, ecapa_loss=0.000293, whisper_loss=0.09972, over 3881779.80 frames. ], batch size: 72, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:08:30,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=360380.0, ans=0.125 2024-08-10 04:08:34,328 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 04:08:42,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2024-08-10 04:08:44,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2024-08-10 04:08:49,616 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 04:08:53,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=360580.0, ans=0.2 2024-08-10 04:08:59,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360580.0, ans=0.1 2024-08-10 04:09:11,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360680.0, ans=0.1 2024-08-10 04:09:12,933 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 04:09:13,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.80 vs. limit=22.5 2024-08-10 04:09:32,609 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7100, loss[loss=0.1092, beats_loss=0.008115, ecapa_loss=0.000328, whisper_loss=0.0978, over 13583.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01231, ecapa_loss=0.0002904, whisper_loss=0.09915, over 3867042.53 frames. ], batch size: 53, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:09:36,863 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 16 from LS+wenet, 19 from Vox, 55 fro AS 2024-08-10 04:09:48,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.064e+01 3.569e+01 4.090e+01 1.167e+02, threshold=7.137e+01, percent-clipped=2.0 2024-08-10 04:10:18,430 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 04:10:41,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7150, loss[loss=0.1111, beats_loss=0.01373, ecapa_loss=0.0002516, whisper_loss=0.09484, over 22600.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01238, ecapa_loss=0.0002897, whisper_loss=0.09912, over 3904300.20 frames. ], batch size: 93, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:11:06,356 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-10 04:11:13,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2024-08-10 04:11:19,424 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 16 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-10 04:11:19,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-10 04:11:24,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2024-08-10 04:11:43,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=361780.0, ans=0.0 2024-08-10 04:11:47,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=361780.0, ans=0.125 2024-08-10 04:11:50,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7200, loss[loss=0.1066, beats_loss=0.01497, ecapa_loss=0.0002238, whisper_loss=0.08936, over 21234.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01243, ecapa_loss=0.0002905, whisper_loss=0.09811, over 3872559.83 frames. ], batch size: 83, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:11:51,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361880.0, ans=0.1 2024-08-10 04:11:52,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=361880.0, ans=0.1 2024-08-10 04:11:52,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=361880.0, ans=0.2 2024-08-10 04:12:03,616 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2024-08-10 04:12:04,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361980.0, ans=0.1 2024-08-10 04:12:07,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.897e+01 3.291e+01 3.668e+01 6.348e+01, threshold=6.581e+01, percent-clipped=0.0 2024-08-10 04:12:13,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=361980.0, ans=0.2 2024-08-10 04:12:37,233 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-10 04:12:53,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=362280.0, ans=0.125 2024-08-10 04:13:01,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7250, loss[loss=0.1062, beats_loss=0.01215, ecapa_loss=0.0003318, whisper_loss=0.09071, over 15290.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.0124, ecapa_loss=0.0002911, whisper_loss=0.09822, over 3871609.01 frames. ], batch size: 63, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:13:01,793 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 04:13:10,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=362380.0, ans=0.125 2024-08-10 04:13:11,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=362380.0, ans=0.125 2024-08-10 04:13:19,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=362480.0, ans=0.125 2024-08-10 04:13:20,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=362480.0, ans=0.0 2024-08-10 04:13:29,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-10 04:13:37,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=362580.0, ans=0.125 2024-08-10 04:13:40,412 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 04:13:49,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-08-10 04:14:12,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7300, loss[loss=0.1167, beats_loss=0.01478, ecapa_loss=0.0002513, whisper_loss=0.09937, over 23350.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01243, ecapa_loss=0.0002912, whisper_loss=0.09868, over 3884323.41 frames. ], batch size: 94, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:14:30,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.958e+01 3.364e+01 4.070e+01 6.476e+01, threshold=6.728e+01, percent-clipped=0.0 2024-08-10 04:14:35,016 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 04:14:50,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=363080.0, ans=0.0 2024-08-10 04:14:51,785 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 04:14:52,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=12.0 2024-08-10 04:15:00,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-10 04:15:13,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=363280.0, ans=0.125 2024-08-10 04:15:14,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363280.0, ans=0.125 2024-08-10 04:15:16,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=363280.0, ans=0.2 2024-08-10 04:15:24,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7350, loss[loss=0.113, beats_loss=0.009109, ecapa_loss=0.0003887, whisper_loss=0.1, over 17877.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01244, ecapa_loss=0.0002914, whisper_loss=0.09769, over 3875624.17 frames. ], batch size: 76, lr: 1.89e-02, grad_scale: 8388608.0 2024-08-10 04:15:24,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=363380.0, ans=0.0 2024-08-10 04:15:28,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-10 04:16:27,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=363780.0, ans=0.2 2024-08-10 04:16:28,702 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.858e-02 2024-08-10 04:16:30,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=363780.0, ans=0.015 2024-08-10 04:16:34,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363780.0, ans=0.1 2024-08-10 04:16:36,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7400, loss[loss=0.1044, beats_loss=0.01305, ecapa_loss=0.0003133, whisper_loss=0.08822, over 21366.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01243, ecapa_loss=0.0002907, whisper_loss=0.0986, over 3880498.51 frames. ], batch size: 84, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:16:37,176 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 04:16:49,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=363880.0, ans=15.0 2024-08-10 04:16:54,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 3.042e+01 3.418e+01 4.034e+01 8.204e+01, threshold=6.837e+01, percent-clipped=2.0 2024-08-10 04:16:57,812 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 04:17:03,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=363980.0, ans=0.125 2024-08-10 04:17:28,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=364180.0, ans=0.125 2024-08-10 04:17:43,424 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-10 04:17:49,387 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7450, loss[loss=0.1137, beats_loss=0.01309, ecapa_loss=0.0002598, whisper_loss=0.098, over 22433.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.01235, ecapa_loss=0.0002931, whisper_loss=0.09958, over 3898033.13 frames. ], batch size: 89, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:17:52,267 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 04:17:55,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2024-08-10 04:18:15,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=364480.0, ans=0.125 2024-08-10 04:18:24,030 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 04:18:57,934 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 04:19:03,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7500, loss[loss=0.1243, beats_loss=0.01228, ecapa_loss=0.000288, whisper_loss=0.1091, over 19296.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01228, ecapa_loss=0.0002936, whisper_loss=0.1004, over 3918281.04 frames. ], batch size: 78, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:19:20,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.956e+01 3.355e+01 3.815e+01 8.528e+01, threshold=6.709e+01, percent-clipped=1.0 2024-08-10 04:19:25,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364980.0, ans=0.1 2024-08-10 04:19:49,169 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 04:20:09,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=365280.0, ans=0.125 2024-08-10 04:20:16,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7550, loss[loss=0.1166, beats_loss=0.01397, ecapa_loss=0.000288, whisper_loss=0.09973, over 21963.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01235, ecapa_loss=0.0002941, whisper_loss=0.0998, over 3917973.28 frames. ], batch size: 88, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:20:17,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=365380.0, ans=0.2 2024-08-10 04:20:18,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=365380.0, ans=0.125 2024-08-10 04:20:20,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=365380.0, ans=0.125 2024-08-10 04:20:30,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=365480.0, ans=0.025 2024-08-10 04:20:44,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=365580.0, ans=0.125 2024-08-10 04:21:08,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=365680.0, ans=0.0 2024-08-10 04:21:15,361 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-10 04:21:27,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=365780.0, ans=0.125 2024-08-10 04:21:30,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7600, loss[loss=0.128, beats_loss=0.01286, ecapa_loss=0.0002326, whisper_loss=0.1128, over 17717.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01238, ecapa_loss=0.0002951, whisper_loss=0.0992, over 3874261.45 frames. ], batch size: 67, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:21:35,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-08-10 04:21:41,415 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 04:21:46,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+01 3.083e+01 3.503e+01 3.988e+01 6.295e+01, threshold=7.005e+01, percent-clipped=0.0 2024-08-10 04:21:59,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=366080.0, ans=0.0 2024-08-10 04:22:04,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-10 04:22:12,975 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-10 04:22:15,700 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-10 04:22:36,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=12.0 2024-08-10 04:22:42,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7650, loss[loss=0.1044, beats_loss=0.01498, ecapa_loss=0.0002713, whisper_loss=0.08667, over 18071.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01229, ecapa_loss=0.0002937, whisper_loss=0.09947, over 3888760.25 frames. ], batch size: 73, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:22:53,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.82 vs. limit=22.5 2024-08-10 04:23:00,515 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 04:23:00,875 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.281e-01 2024-08-10 04:23:23,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=366580.0, ans=0.0 2024-08-10 04:23:25,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=366680.0, ans=0.1 2024-08-10 04:23:54,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7700, loss[loss=0.1137, beats_loss=0.009586, ecapa_loss=0.0002616, whisper_loss=0.1015, over 18627.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01232, ecapa_loss=0.0002952, whisper_loss=0.0985, over 3902654.48 frames. ], batch size: 68, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:23:57,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=366880.0, ans=0.125 2024-08-10 04:24:01,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366880.0, ans=0.1 2024-08-10 04:24:07,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=366880.0, ans=0.0 2024-08-10 04:24:12,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.962e+01 3.373e+01 3.972e+01 7.552e+01, threshold=6.745e+01, percent-clipped=1.0 2024-08-10 04:24:12,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=366980.0, ans=0.125 2024-08-10 04:24:14,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-10 04:24:16,399 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 04:24:33,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=367080.0, ans=0.125 2024-08-10 04:24:41,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-10 04:24:42,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=15.0 2024-08-10 04:24:48,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=367180.0, ans=0.125 2024-08-10 04:24:53,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-10 04:25:06,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7750, loss[loss=0.1171, beats_loss=0.01231, ecapa_loss=0.0002392, whisper_loss=0.1024, over 15704.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01231, ecapa_loss=0.0002929, whisper_loss=0.09826, over 3888661.35 frames. ], batch size: 57, lr: 1.88e-02, grad_scale: 8388608.0 2024-08-10 04:25:11,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=367380.0, ans=0.125 2024-08-10 04:25:17,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=367380.0, ans=0.125 2024-08-10 04:25:25,652 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 11 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-10 04:25:38,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=367580.0, ans=0.1 2024-08-10 04:25:59,741 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 04:26:01,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=367680.0, ans=0.025 2024-08-10 04:26:04,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-10 04:26:07,905 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 04:26:17,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=367880.0, ans=0.0 2024-08-10 04:26:18,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7800, loss[loss=0.1124, beats_loss=0.01228, ecapa_loss=0.0002883, whisper_loss=0.0972, over 20209.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01231, ecapa_loss=0.0002935, whisper_loss=0.09819, over 3886941.06 frames. ], batch size: 83, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:26:25,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=367880.0, ans=0.0 2024-08-10 04:26:26,705 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 04:26:33,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=367980.0, ans=0.1 2024-08-10 04:26:34,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.424e+01 3.081e+01 3.363e+01 3.893e+01 6.913e+01, threshold=6.726e+01, percent-clipped=1.0 2024-08-10 04:26:36,374 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 04:26:43,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=367980.0, ans=0.125 2024-08-10 04:26:44,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=368080.0, ans=0.125 2024-08-10 04:26:49,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2024-08-10 04:26:57,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=368080.0, ans=0.125 2024-08-10 04:27:12,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=368280.0, ans=0.125 2024-08-10 04:27:25,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=368280.0, ans=0.0 2024-08-10 04:27:26,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-10 04:27:28,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7850, loss[loss=0.088, beats_loss=0.01016, ecapa_loss=0.00028, whisper_loss=0.07504, over 14537.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01231, ecapa_loss=0.0002926, whisper_loss=0.0988, over 3894287.43 frames. ], batch size: 56, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:27:37,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=368380.0, ans=0.2 2024-08-10 04:27:42,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2024-08-10 04:27:44,356 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 04:27:50,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=368480.0, ans=0.05 2024-08-10 04:28:02,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=12.0 2024-08-10 04:28:03,873 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 04:28:06,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.11 vs. limit=22.5 2024-08-10 04:28:11,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-08-10 04:28:14,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=368680.0, ans=0.125 2024-08-10 04:28:25,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=368780.0, ans=0.125 2024-08-10 04:28:31,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=368780.0, ans=0.125 2024-08-10 04:28:38,449 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7900, loss[loss=0.1105, beats_loss=0.01178, ecapa_loss=0.0003128, whisper_loss=0.09563, over 22615.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01241, ecapa_loss=0.0002919, whisper_loss=0.09853, over 3886369.80 frames. ], batch size: 92, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:28:49,483 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 04:28:54,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+01 2.975e+01 3.379e+01 4.027e+01 6.816e+01, threshold=6.758e+01, percent-clipped=1.0 2024-08-10 04:28:55,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=368980.0, ans=0.0 2024-08-10 04:29:00,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=368980.0, ans=0.125 2024-08-10 04:29:14,378 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 04:29:26,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.03 vs. limit=22.5 2024-08-10 04:29:33,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=369280.0, ans=0.125 2024-08-10 04:29:44,079 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 04:29:44,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=369280.0, ans=0.125 2024-08-10 04:29:47,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 7950, loss[loss=0.09741, beats_loss=0.01346, ecapa_loss=0.0002862, whisper_loss=0.08109, over 18789.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01242, ecapa_loss=0.00029, whisper_loss=0.09861, over 3877722.17 frames. ], batch size: 77, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:29:54,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=369380.0, ans=0.0 2024-08-10 04:29:59,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=369380.0, ans=0.125 2024-08-10 04:30:02,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=369480.0, ans=0.0 2024-08-10 04:30:09,946 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 34 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 04:30:25,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=369580.0, ans=0.0 2024-08-10 04:30:30,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=369680.0, ans=0.125 2024-08-10 04:30:47,108 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 04:30:48,678 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 04:30:53,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=369780.0, ans=0.125 2024-08-10 04:30:56,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8000, loss[loss=0.1145, beats_loss=0.0127, ecapa_loss=0.0002545, whisper_loss=0.09921, over 21136.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01231, ecapa_loss=0.0002886, whisper_loss=0.09879, over 3884539.45 frames. ], batch size: 84, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:31:01,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369880.0, ans=0.1 2024-08-10 04:31:02,268 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 04:31:06,420 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 04:31:13,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 3.026e+01 3.341e+01 3.954e+01 6.055e+01, threshold=6.681e+01, percent-clipped=0.0 2024-08-10 04:31:22,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=369980.0, ans=0.125 2024-08-10 04:31:24,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=370080.0, ans=0.05 2024-08-10 04:31:31,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=370080.0, ans=0.2 2024-08-10 04:31:41,228 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-10 04:31:47,698 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-10 04:32:02,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-10 04:32:04,208 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 04:32:04,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=370380.0, ans=0.125 2024-08-10 04:32:05,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8050, loss[loss=0.1185, beats_loss=0.01072, ecapa_loss=0.000307, whisper_loss=0.1047, over 15889.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01231, ecapa_loss=0.0002879, whisper_loss=0.09898, over 3882501.57 frames. ], batch size: 63, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:32:09,744 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 04:32:12,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=370380.0, ans=0.125 2024-08-10 04:32:13,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-08-10 04:32:20,077 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 04:32:21,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=370480.0, ans=0.0 2024-08-10 04:32:42,325 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 04:32:43,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=370580.0, ans=0.125 2024-08-10 04:33:08,590 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 04:33:14,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8100, loss[loss=0.1217, beats_loss=0.008431, ecapa_loss=0.0003569, whisper_loss=0.1097, over 13307.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.0123, ecapa_loss=0.000288, whisper_loss=0.09924, over 3885487.48 frames. ], batch size: 55, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:33:16,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=370880.0, ans=0.125 2024-08-10 04:33:26,480 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 04:33:26,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370880.0, ans=0.1 2024-08-10 04:33:31,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.921e+01 3.268e+01 3.818e+01 1.425e+02, threshold=6.536e+01, percent-clipped=1.0 2024-08-10 04:33:31,712 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 04:33:32,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.04 vs. limit=15.0 2024-08-10 04:33:33,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370980.0, ans=0.1 2024-08-10 04:33:59,349 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 04:34:04,954 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 04:34:12,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=371280.0, ans=0.0 2024-08-10 04:34:20,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.37 vs. limit=22.5 2024-08-10 04:34:23,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8150, loss[loss=0.1156, beats_loss=0.01533, ecapa_loss=0.0002376, whisper_loss=0.09784, over 23220.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01229, ecapa_loss=0.0002885, whisper_loss=0.09951, over 3883836.24 frames. ], batch size: 89, lr: 1.87e-02, grad_scale: 8388608.0 2024-08-10 04:34:28,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-10 04:34:29,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.93 vs. limit=15.0 2024-08-10 04:34:36,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=371480.0, ans=0.0 2024-08-10 04:34:43,197 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 13 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 04:34:52,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=371580.0, ans=0.125 2024-08-10 04:35:14,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=371680.0, ans=0.0 2024-08-10 04:35:24,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.90 vs. limit=15.0 2024-08-10 04:35:28,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=371780.0, ans=0.0 2024-08-10 04:35:31,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8200, loss[loss=0.1224, beats_loss=0.01047, ecapa_loss=0.0002765, whisper_loss=0.1092, over 18789.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01225, ecapa_loss=0.0002914, whisper_loss=0.09946, over 3896289.69 frames. ], batch size: 75, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:35:37,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2024-08-10 04:35:37,760 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 04:35:48,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 2.993e+01 3.348e+01 3.834e+01 8.342e+01, threshold=6.697e+01, percent-clipped=3.0 2024-08-10 04:35:49,016 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 04:36:00,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2024-08-10 04:36:06,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-10 04:36:19,594 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 04:36:22,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=372180.0, ans=0.125 2024-08-10 04:36:24,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=372180.0, ans=0.1 2024-08-10 04:36:26,628 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 04:36:28,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=372280.0, ans=0.125 2024-08-10 04:36:30,908 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 04:36:31,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=372280.0, ans=0.0 2024-08-10 04:36:37,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=372280.0, ans=0.125 2024-08-10 04:36:42,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8250, loss[loss=0.1128, beats_loss=0.01455, ecapa_loss=0.0002683, whisper_loss=0.09555, over 22235.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01231, ecapa_loss=0.0002889, whisper_loss=0.09907, over 3887574.42 frames. ], batch size: 87, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:36:44,353 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 04:36:45,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372380.0, ans=0.1 2024-08-10 04:37:27,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=372680.0, ans=0.0 2024-08-10 04:37:31,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=372680.0, ans=0.0 2024-08-10 04:37:34,544 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 17 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 04:37:34,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=372680.0, ans=0.0 2024-08-10 04:37:39,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=372780.0, ans=0.125 2024-08-10 04:37:41,786 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 04:37:48,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=372780.0, ans=0.125 2024-08-10 04:37:54,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8300, loss[loss=0.1147, beats_loss=0.01192, ecapa_loss=0.0002414, whisper_loss=0.1003, over 23315.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01226, ecapa_loss=0.0002892, whisper_loss=0.09879, over 3891759.74 frames. ], batch size: 91, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:38:05,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=372880.0, ans=0.125 2024-08-10 04:38:08,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=372980.0, ans=0.125 2024-08-10 04:38:12,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+01 3.111e+01 3.544e+01 4.051e+01 1.362e+02, threshold=7.088e+01, percent-clipped=2.0 2024-08-10 04:38:18,606 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 04:38:33,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=373080.0, ans=0.125 2024-08-10 04:38:33,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.19 vs. limit=6.0 2024-08-10 04:38:49,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=373180.0, ans=0.2 2024-08-10 04:38:54,478 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 04:39:00,404 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-10 04:39:02,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=373280.0, ans=0.125 2024-08-10 04:39:07,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8350, loss[loss=0.09375, beats_loss=0.00953, ecapa_loss=0.0003454, whisper_loss=0.08077, over 17163.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01229, ecapa_loss=0.0002886, whisper_loss=0.09815, over 3886080.21 frames. ], batch size: 74, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:39:13,170 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 04:39:15,953 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 04:39:42,741 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 04:39:56,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=373680.0, ans=0.1 2024-08-10 04:40:17,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=373780.0, ans=0.0 2024-08-10 04:40:26,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8400, loss[loss=0.1099, beats_loss=0.008311, ecapa_loss=0.0003216, whisper_loss=0.09839, over 14476.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01224, ecapa_loss=0.0002881, whisper_loss=0.0984, over 3875565.50 frames. ], batch size: 53, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:40:26,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=373880.0, ans=0.125 2024-08-10 04:40:37,644 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 04:40:41,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.49 vs. limit=22.5 2024-08-10 04:40:48,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+01 2.937e+01 3.360e+01 3.795e+01 5.469e+01, threshold=6.721e+01, percent-clipped=0.0 2024-08-10 04:41:17,667 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 04:41:23,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=374180.0, ans=0.2 2024-08-10 04:41:44,421 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 04:41:55,201 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 04:41:57,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8450, loss[loss=0.1079, beats_loss=0.01439, ecapa_loss=0.0002346, whisper_loss=0.0912, over 23309.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01222, ecapa_loss=0.0002882, whisper_loss=0.099, over 3862458.40 frames. ], batch size: 93, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:42:07,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=374380.0, ans=0.05 2024-08-10 04:42:08,370 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 04:42:08,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=374380.0, ans=0.125 2024-08-10 04:42:15,698 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 32 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 04:42:41,815 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 04:42:54,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=374680.0, ans=0.0 2024-08-10 04:42:59,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374680.0, ans=0.1 2024-08-10 04:43:10,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=15.0 2024-08-10 04:43:12,186 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.860e+05 2024-08-10 04:43:27,382 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8500, loss[loss=0.1108, beats_loss=0.01469, ecapa_loss=0.0002826, whisper_loss=0.0933, over 22629.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01224, ecapa_loss=0.0002891, whisper_loss=0.09958, over 3890342.52 frames. ], batch size: 94, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:43:31,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=374880.0, ans=0.125 2024-08-10 04:43:48,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.067e+01 3.351e+01 3.844e+01 5.655e+01, threshold=6.702e+01, percent-clipped=0.0 2024-08-10 04:44:03,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=375080.0, ans=0.0 2024-08-10 04:44:12,960 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 04:44:25,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=375180.0, ans=0.0 2024-08-10 04:44:28,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=375180.0, ans=0.04949747468305833 2024-08-10 04:44:54,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8550, loss[loss=0.1247, beats_loss=0.01309, ecapa_loss=0.0003019, whisper_loss=0.1085, over 21478.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.0122, ecapa_loss=0.0002889, whisper_loss=0.1001, over 3910683.93 frames. ], batch size: 86, lr: 1.86e-02, grad_scale: 8388608.0 2024-08-10 04:45:17,633 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 04:45:23,865 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 04:45:45,291 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 39 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 04:45:49,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=375680.0, ans=0.2 2024-08-10 04:46:11,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8600, loss[loss=0.09356, beats_loss=0.01359, ecapa_loss=0.0003409, whisper_loss=0.07657, over 18436.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01226, ecapa_loss=0.0002885, whisper_loss=0.1008, over 3910660.18 frames. ], batch size: 77, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:46:12,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-10 04:46:27,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.077e+01 3.509e+01 3.969e+01 6.307e+01, threshold=7.019e+01, percent-clipped=0.0 2024-08-10 04:46:32,200 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 10 from LS+wenet, 9 from Vox, 38 fro AS 2024-08-10 04:46:36,289 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 04:46:57,789 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 04:47:03,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376180.0, ans=0.1 2024-08-10 04:47:13,966 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 04:47:18,729 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 04:47:20,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.09 vs. limit=15.0 2024-08-10 04:47:21,466 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8650, loss[loss=0.09014, beats_loss=0.0162, ecapa_loss=0.0002688, whisper_loss=0.07125, over 21994.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.01233, ecapa_loss=0.0002879, whisper_loss=0.1001, over 3915703.77 frames. ], batch size: 95, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:47:22,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-10 04:47:22,938 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 04:47:28,842 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 04:47:31,595 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 04:47:34,138 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-10 04:47:37,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-08-10 04:47:44,442 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.128e+03 2024-08-10 04:47:54,192 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 04:47:54,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=376580.0, ans=0.1 2024-08-10 04:48:24,638 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 04:48:31,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8700, loss[loss=0.1293, beats_loss=0.01397, ecapa_loss=0.0002869, whisper_loss=0.1125, over 21962.00 frames. ], tot_loss[loss=0.1161, beats_loss=0.01223, ecapa_loss=0.0002888, whisper_loss=0.101, over 3902867.21 frames. ], batch size: 88, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:48:35,779 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 04:48:40,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.03 vs. limit=15.0 2024-08-10 04:48:41,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-08-10 04:48:47,790 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.537e+01 3.015e+01 3.371e+01 3.912e+01 6.380e+01, threshold=6.741e+01, percent-clipped=0.0 2024-08-10 04:48:49,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=376980.0, ans=0.125 2024-08-10 04:48:49,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=376980.0, ans=0.0 2024-08-10 04:48:50,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=376980.0, ans=0.125 2024-08-10 04:49:03,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377080.0, ans=0.1 2024-08-10 04:49:14,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-10 04:49:15,445 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 04:49:39,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8750, loss[loss=0.1184, beats_loss=0.01075, ecapa_loss=0.0003114, whisper_loss=0.1046, over 22281.00 frames. ], tot_loss[loss=0.1163, beats_loss=0.01214, ecapa_loss=0.0002911, whisper_loss=0.1013, over 3871670.56 frames. ], batch size: 88, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:49:46,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=377380.0, ans=0.125 2024-08-10 04:49:57,812 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 04:50:01,637 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 04:50:03,177 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-10 04:50:19,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-08-10 04:50:19,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=377680.0, ans=0.125 2024-08-10 04:50:20,979 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 04:50:25,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=377680.0, ans=0.07 2024-08-10 04:50:29,158 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 04:50:47,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8800, loss[loss=0.1353, beats_loss=0.00991, ecapa_loss=0.0003117, whisper_loss=0.1223, over 22113.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01226, ecapa_loss=0.0002893, whisper_loss=0.1006, over 3897876.74 frames. ], batch size: 87, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:50:56,250 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 04:51:01,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-08-10 04:51:04,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.131e+01 3.473e+01 4.096e+01 6.875e+01, threshold=6.946e+01, percent-clipped=1.0 2024-08-10 04:51:06,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=377980.0, ans=0.125 2024-08-10 04:51:17,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-08-10 04:51:19,618 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 04:51:26,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=378080.0, ans=0.0 2024-08-10 04:51:27,801 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 04:51:38,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=378180.0, ans=0.125 2024-08-10 04:51:43,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=12.0 2024-08-10 04:51:49,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=378280.0, ans=0.0 2024-08-10 04:51:57,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8850, loss[loss=0.1197, beats_loss=0.01234, ecapa_loss=0.0002628, whisper_loss=0.1047, over 22641.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.0122, ecapa_loss=0.0002879, whisper_loss=0.1001, over 3880605.45 frames. ], batch size: 91, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:52:16,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=378480.0, ans=0.0 2024-08-10 04:52:21,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=378480.0, ans=0.0 2024-08-10 04:52:23,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=378480.0, ans=0.0 2024-08-10 04:52:28,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378580.0, ans=0.125 2024-08-10 04:52:44,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=378680.0, ans=0.125 2024-08-10 04:52:52,687 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 04:52:52,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=378780.0, ans=0.0 2024-08-10 04:53:05,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8900, loss[loss=0.1144, beats_loss=0.0154, ecapa_loss=0.0002396, whisper_loss=0.09662, over 19218.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01226, ecapa_loss=0.0002874, whisper_loss=0.09999, over 3879031.75 frames. ], batch size: 76, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:53:12,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=378880.0, ans=0.125 2024-08-10 04:53:18,227 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 04:53:22,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+01 3.017e+01 3.379e+01 3.848e+01 7.752e+01, threshold=6.759e+01, percent-clipped=1.0 2024-08-10 04:53:22,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378980.0, ans=0.1 2024-08-10 04:53:51,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=379180.0, ans=0.0 2024-08-10 04:53:54,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=379180.0, ans=0.0 2024-08-10 04:54:14,273 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 8950, loss[loss=0.1093, beats_loss=0.01318, ecapa_loss=0.0002697, whisper_loss=0.09344, over 21662.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01218, ecapa_loss=0.0002865, whisper_loss=0.1004, over 3874233.27 frames. ], batch size: 88, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:54:18,213 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 04:54:31,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=379480.0, ans=0.125 2024-08-10 04:54:35,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=379480.0, ans=0.07 2024-08-10 04:54:39,968 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 04:54:48,556 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 04:54:59,258 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 04:55:03,509 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 04:55:15,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=379780.0, ans=0.125 2024-08-10 04:55:22,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9000, loss[loss=0.1215, beats_loss=0.01256, ecapa_loss=0.0002544, whisper_loss=0.1064, over 22640.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01226, ecapa_loss=0.0002879, whisper_loss=0.1004, over 3891484.49 frames. ], batch size: 87, lr: 1.85e-02, grad_scale: 8388608.0 2024-08-10 04:55:22,704 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 04:56:01,298 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on ASR_libri: loss=0.2773, beats_loss=0, ecapa_loss=0.0008691, whisper_loss=0.2686, over 922467.00 frames. 2024-08-10 04:56:19,285 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on SV_voxceleb1: loss=0.007577, beats_loss=0, ecapa_loss=0.0007577, whisper_loss=0, over 939242.00 frames. 2024-08-10 04:57:22,254 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4483, 3.0748, 2.8239, 2.9334], device='cuda:0') 2024-08-10 04:58:16,655 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on AT_audioset: loss=0.02874, beats_loss=0.02874, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 04:58:16,665 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 04:58:23,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=379880.0, ans=0.1 2024-08-10 04:58:30,565 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 04:58:33,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.379e+01 3.022e+01 3.372e+01 4.052e+01 6.376e+01, threshold=6.745e+01, percent-clipped=0.0 2024-08-10 04:58:36,234 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 04:58:41,427 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 04:58:47,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=380080.0, ans=0.125 2024-08-10 04:58:48,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=380080.0, ans=22.5 2024-08-10 04:58:50,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=380080.0, ans=0.125 2024-08-10 04:58:53,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380080.0, ans=0.1 2024-08-10 04:59:16,612 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 04:59:25,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9050, loss[loss=0.1161, beats_loss=0.01205, ecapa_loss=0.0002784, whisper_loss=0.1013, over 15518.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01221, ecapa_loss=0.0002905, whisper_loss=0.1006, over 3900405.07 frames. ], batch size: 59, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 04:59:40,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-08-10 04:59:45,351 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 04:59:59,046 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 05:00:03,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=380580.0, ans=0.125 2024-08-10 05:00:06,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=380680.0, ans=0.0 2024-08-10 05:00:07,289 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 05:00:12,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=380680.0, ans=0.125 2024-08-10 05:00:16,874 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-10 05:00:28,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=380780.0, ans=0.125 2024-08-10 05:00:30,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=380780.0, ans=0.0 2024-08-10 05:00:34,542 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9100, loss[loss=0.1126, beats_loss=0.01162, ecapa_loss=0.000282, whisper_loss=0.09811, over 18997.00 frames. ], tot_loss[loss=0.116, beats_loss=0.01213, ecapa_loss=0.0002917, whisper_loss=0.101, over 3879249.56 frames. ], batch size: 73, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:00:35,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=380880.0, ans=0.2 2024-08-10 05:00:40,344 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 05:00:44,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=380880.0, ans=0.0 2024-08-10 05:00:51,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.817e+01 3.235e+01 3.647e+01 7.816e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 05:00:51,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-08-10 05:01:04,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=381080.0, ans=0.0 2024-08-10 05:01:05,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=381080.0, ans=0.0 2024-08-10 05:01:09,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=381080.0, ans=0.125 2024-08-10 05:01:13,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-08-10 05:01:16,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=381180.0, ans=0.0 2024-08-10 05:01:24,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.80 vs. limit=15.0 2024-08-10 05:01:38,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381280.0, ans=0.1 2024-08-10 05:01:43,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9150, loss[loss=0.1315, beats_loss=0.01057, ecapa_loss=0.0003031, whisper_loss=0.1179, over 16257.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01225, ecapa_loss=0.0002891, whisper_loss=0.1003, over 3932731.53 frames. ], batch size: 62, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:01:45,165 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 05:01:49,405 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 05:01:53,506 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 05:01:56,131 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 05:02:36,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2024-08-10 05:02:47,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=381780.0, ans=0.0 2024-08-10 05:02:48,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=381780.0, ans=0.2 2024-08-10 05:02:48,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=381780.0, ans=0.09899494936611666 2024-08-10 05:02:50,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2024-08-10 05:02:52,629 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9200, loss[loss=0.107, beats_loss=0.01241, ecapa_loss=0.0003117, whisper_loss=0.09146, over 21788.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01226, ecapa_loss=0.0002897, whisper_loss=0.09996, over 3929869.67 frames. ], batch size: 92, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:03:05,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2024-08-10 05:03:07,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-08-10 05:03:08,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.284e+01 3.038e+01 3.317e+01 3.849e+01 8.293e+01, threshold=6.633e+01, percent-clipped=1.0 2024-08-10 05:03:09,402 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 05:03:11,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-08-10 05:03:13,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-10 05:03:34,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2024-08-10 05:03:44,693 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 05:03:51,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=382280.0, ans=0.07 2024-08-10 05:03:56,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=382280.0, ans=0.125 2024-08-10 05:03:58,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=382280.0, ans=0.125 2024-08-10 05:04:00,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9250, loss[loss=0.1282, beats_loss=0.01135, ecapa_loss=0.0003507, whisper_loss=0.1133, over 22425.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01218, ecapa_loss=0.0002903, whisper_loss=0.1004, over 3913719.79 frames. ], batch size: 92, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:04:11,001 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 05:04:22,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.25 vs. limit=22.5 2024-08-10 05:04:28,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=382580.0, ans=0.125 2024-08-10 05:04:37,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-10 05:04:43,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-10 05:04:46,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=382680.0, ans=0.95 2024-08-10 05:04:47,771 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 05:04:50,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=382680.0, ans=0.0 2024-08-10 05:04:52,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.87 vs. limit=15.0 2024-08-10 05:05:09,749 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9300, loss[loss=0.107, beats_loss=0.01247, ecapa_loss=0.0003405, whisper_loss=0.09111, over 16074.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01215, ecapa_loss=0.000292, whisper_loss=0.1007, over 3912106.55 frames. ], batch size: 64, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:05:13,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.35 vs. limit=22.5 2024-08-10 05:05:26,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.442e+01 3.054e+01 3.480e+01 4.164e+01 1.138e+02, threshold=6.960e+01, percent-clipped=2.0 2024-08-10 05:05:29,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-08-10 05:05:46,061 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-10 05:05:54,240 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 05:06:16,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=15.0 2024-08-10 05:06:18,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9350, loss[loss=0.1143, beats_loss=0.009154, ecapa_loss=0.0003308, whisper_loss=0.1018, over 15672.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01215, ecapa_loss=0.0002896, whisper_loss=0.1008, over 3897908.94 frames. ], batch size: 62, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:06:19,044 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-10 05:06:27,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=383380.0, ans=0.05 2024-08-10 05:06:41,661 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.629e-03 2024-08-10 05:06:51,430 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 05:06:58,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=383580.0, ans=0.0 2024-08-10 05:07:00,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=383680.0, ans=0.04949747468305833 2024-08-10 05:07:18,034 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 05:07:25,458 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 05:07:29,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9400, loss[loss=0.1179, beats_loss=0.01204, ecapa_loss=0.0002805, whisper_loss=0.103, over 20129.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.0122, ecapa_loss=0.0002906, whisper_loss=0.1001, over 3894183.16 frames. ], batch size: 80, lr: 1.84e-02, grad_scale: 16777216.0 2024-08-10 05:07:35,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=383880.0, ans=0.0 2024-08-10 05:07:39,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383880.0, ans=0.125 2024-08-10 05:07:44,553 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 05:07:45,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+01 2.835e+01 3.411e+01 3.975e+01 7.515e+01, threshold=6.823e+01, percent-clipped=1.0 2024-08-10 05:07:47,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=383980.0, ans=0.1 2024-08-10 05:07:52,652 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 27 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-10 05:07:54,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=383980.0, ans=0.125 2024-08-10 05:08:11,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=384180.0, ans=0.0 2024-08-10 05:08:15,820 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 05:08:17,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.32 vs. limit=22.5 2024-08-10 05:08:19,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-08-10 05:08:24,185 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 05:08:30,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=384280.0, ans=0.0 2024-08-10 05:08:34,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=12.0 2024-08-10 05:08:37,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9450, loss[loss=0.1228, beats_loss=0.01371, ecapa_loss=0.0002437, whisper_loss=0.1067, over 23688.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01217, ecapa_loss=0.0002925, whisper_loss=0.1, over 3895118.44 frames. ], batch size: 95, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:08:51,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384480.0, ans=0.1 2024-08-10 05:08:57,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=384480.0, ans=0.125 2024-08-10 05:09:06,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=384580.0, ans=0.07 2024-08-10 05:09:23,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=384680.0, ans=0.0 2024-08-10 05:09:45,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=384880.0, ans=0.125 2024-08-10 05:09:46,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9500, loss[loss=0.1205, beats_loss=0.01413, ecapa_loss=0.0002501, whisper_loss=0.1038, over 21904.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01228, ecapa_loss=0.0002911, whisper_loss=0.09964, over 3898283.17 frames. ], batch size: 90, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:10:02,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2024-08-10 05:10:03,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.035e+01 3.445e+01 3.941e+01 9.468e+01, threshold=6.890e+01, percent-clipped=2.0 2024-08-10 05:10:26,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=385080.0, ans=0.125 2024-08-10 05:10:50,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=385280.0, ans=0.1 2024-08-10 05:10:55,330 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9550, loss[loss=0.09016, beats_loss=0.01224, ecapa_loss=0.000294, whisper_loss=0.07498, over 17383.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01222, ecapa_loss=0.0002923, whisper_loss=0.09962, over 3856877.18 frames. ], batch size: 70, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:10:55,647 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 05:10:58,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=385380.0, ans=0.0 2024-08-10 05:11:13,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=385480.0, ans=0.125 2024-08-10 05:11:18,566 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-10 05:11:28,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=385580.0, ans=0.025 2024-08-10 05:11:37,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2024-08-10 05:11:58,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=385780.0, ans=0.0 2024-08-10 05:12:02,111 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 05:12:04,865 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9600, loss[loss=0.1006, beats_loss=0.01237, ecapa_loss=0.0002801, whisper_loss=0.08539, over 16175.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01218, ecapa_loss=0.0002937, whisper_loss=0.09949, over 3841328.65 frames. ], batch size: 66, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:12:12,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=385880.0, ans=0.125 2024-08-10 05:12:21,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 3.010e+01 3.489e+01 4.021e+01 7.106e+01, threshold=6.979e+01, percent-clipped=1.0 2024-08-10 05:12:27,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=385980.0, ans=0.2 2024-08-10 05:12:48,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386180.0, ans=0.1 2024-08-10 05:12:49,987 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 05:12:51,389 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 05:12:54,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386180.0, ans=0.1 2024-08-10 05:13:06,510 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 05:13:08,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=386280.0, ans=0.0 2024-08-10 05:13:14,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9650, loss[loss=0.1046, beats_loss=0.01078, ecapa_loss=0.000248, whisper_loss=0.0913, over 16272.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01217, ecapa_loss=0.0002905, whisper_loss=0.09958, over 3798818.67 frames. ], batch size: 60, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:13:15,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=386380.0, ans=0.2 2024-08-10 05:13:21,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=386380.0, ans=0.125 2024-08-10 05:13:21,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=386380.0, ans=0.125 2024-08-10 05:13:24,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=386380.0, ans=0.09899494936611666 2024-08-10 05:13:25,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=386380.0, ans=0.0 2024-08-10 05:13:25,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=386380.0, ans=0.125 2024-08-10 05:14:02,607 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 05:14:10,757 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-10 05:14:24,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9700, loss[loss=0.08279, beats_loss=0.01673, ecapa_loss=0.000336, whisper_loss=0.0627, over 21960.00 frames. ], tot_loss[loss=0.115, beats_loss=0.0121, ecapa_loss=0.0002921, whisper_loss=0.09997, over 3791245.90 frames. ], batch size: 95, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:14:40,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.854e+01 3.317e+01 3.898e+01 6.731e+01, threshold=6.635e+01, percent-clipped=0.0 2024-08-10 05:14:42,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=386980.0, ans=0.0 2024-08-10 05:14:44,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-08-10 05:15:00,443 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 05:15:17,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=387180.0, ans=0.2 2024-08-10 05:15:22,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.93 vs. limit=15.0 2024-08-10 05:15:33,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9750, loss[loss=0.1199, beats_loss=0.01375, ecapa_loss=0.0002148, whisper_loss=0.104, over 23642.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01218, ecapa_loss=0.0002901, whisper_loss=0.09998, over 3804728.90 frames. ], batch size: 93, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:16:07,342 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 05:16:43,168 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9800, loss[loss=0.1122, beats_loss=0.009128, ecapa_loss=0.0003842, whisper_loss=0.09924, over 13325.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01219, ecapa_loss=0.0002908, whisper_loss=0.1001, over 3788745.53 frames. ], batch size: 56, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:16:44,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=387880.0, ans=0.0 2024-08-10 05:16:49,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=387880.0, ans=0.125 2024-08-10 05:16:50,292 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 05:16:56,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=387980.0, ans=0.125 2024-08-10 05:16:59,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 2.835e+01 3.207e+01 3.802e+01 6.736e+01, threshold=6.414e+01, percent-clipped=1.0 2024-08-10 05:17:05,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=387980.0, ans=0.125 2024-08-10 05:17:07,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-10 05:17:16,466 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 05:17:22,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2024-08-10 05:17:28,863 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 05:17:34,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=388180.0, ans=0.0 2024-08-10 05:17:51,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9850, loss[loss=0.09449, beats_loss=0.01281, ecapa_loss=0.0003199, whisper_loss=0.07848, over 21433.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01215, ecapa_loss=0.0002921, whisper_loss=0.09977, over 3798666.88 frames. ], batch size: 91, lr: 1.83e-02, grad_scale: 16777216.0 2024-08-10 05:17:52,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=388380.0, ans=0.0 2024-08-10 05:18:07,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=388480.0, ans=0.125 2024-08-10 05:18:49,678 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-10 05:18:50,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2024-08-10 05:19:00,754 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9900, loss[loss=0.1041, beats_loss=0.01521, ecapa_loss=0.0002254, whisper_loss=0.08665, over 14925.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01228, ecapa_loss=0.0002904, whisper_loss=0.09949, over 3826548.76 frames. ], batch size: 58, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:19:16,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388980.0, ans=0.1 2024-08-10 05:19:17,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.911e+01 3.357e+01 3.805e+01 2.149e+02, threshold=6.715e+01, percent-clipped=2.0 2024-08-10 05:19:19,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=388980.0, ans=0.025 2024-08-10 05:19:20,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=388980.0, ans=0.2 2024-08-10 05:19:31,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=389080.0, ans=0.0 2024-08-10 05:19:39,887 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 05:19:48,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=389180.0, ans=0.125 2024-08-10 05:20:01,766 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 05:20:03,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=389280.0, ans=0.0 2024-08-10 05:20:06,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=389280.0, ans=0.125 2024-08-10 05:20:07,255 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 15 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-10 05:20:10,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 9950, loss[loss=0.1002, beats_loss=0.01522, ecapa_loss=0.0002695, whisper_loss=0.08229, over 18881.00 frames. ], tot_loss[loss=0.1145, beats_loss=0.01229, ecapa_loss=0.0002916, whisper_loss=0.09927, over 3816837.67 frames. ], batch size: 74, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:20:23,873 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 05:20:26,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=389480.0, ans=0.05 2024-08-10 05:20:37,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=389580.0, ans=0.125 2024-08-10 05:20:44,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389580.0, ans=0.1 2024-08-10 05:20:55,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-10 05:21:08,629 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 05:21:12,724 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 05:21:16,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=389880.0, ans=0.125 2024-08-10 05:21:17,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10000, loss[loss=0.1082, beats_loss=0.01203, ecapa_loss=0.0002984, whisper_loss=0.09319, over 22253.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01232, ecapa_loss=0.00029, whisper_loss=0.09906, over 3848791.64 frames. ], batch size: 91, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:21:26,349 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 05:21:27,715 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 05:21:34,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.053e+01 3.527e+01 4.199e+01 1.415e+02, threshold=7.054e+01, percent-clipped=3.0 2024-08-10 05:21:38,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=389980.0, ans=0.2 2024-08-10 05:21:48,788 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 05:21:53,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=390080.0, ans=0.125 2024-08-10 05:21:59,887 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 05:22:00,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=390180.0, ans=0.125 2024-08-10 05:22:08,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=390180.0, ans=0.0 2024-08-10 05:22:17,844 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 05:22:20,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=390280.0, ans=0.015 2024-08-10 05:22:25,122 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.584e-01 2024-08-10 05:22:27,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10050, loss[loss=0.0949, beats_loss=0.01079, ecapa_loss=0.0003241, whisper_loss=0.08086, over 13978.00 frames. ], tot_loss[loss=0.115, beats_loss=0.01222, ecapa_loss=0.000292, whisper_loss=0.09984, over 3835399.64 frames. ], batch size: 57, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:22:27,415 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 05:22:50,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-08-10 05:23:00,342 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 05:23:01,514 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 05:23:03,026 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 05:23:06,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-10 05:23:35,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10100, loss[loss=0.08513, beats_loss=0.01305, ecapa_loss=0.0003538, whisper_loss=0.06853, over 18989.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01231, ecapa_loss=0.0002901, whisper_loss=0.09938, over 3870056.73 frames. ], batch size: 81, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:23:47,824 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 05:23:49,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=390980.0, ans=0.125 2024-08-10 05:23:51,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.903e+01 3.270e+01 3.742e+01 9.283e+01, threshold=6.541e+01, percent-clipped=1.0 2024-08-10 05:24:23,246 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 05:24:27,094 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 05:24:38,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-10 05:24:44,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10150, loss[loss=0.1003, beats_loss=0.0124, ecapa_loss=0.0002929, whisper_loss=0.08494, over 19099.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01228, ecapa_loss=0.000292, whisper_loss=0.09939, over 3912297.76 frames. ], batch size: 78, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:24:49,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-10 05:25:16,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=391580.0, ans=0.125 2024-08-10 05:25:18,045 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 05:25:32,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391680.0, ans=0.125 2024-08-10 05:25:46,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2024-08-10 05:25:47,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=391780.0, ans=0.05 2024-08-10 05:25:50,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=391780.0, ans=0.125 2024-08-10 05:25:57,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10200, loss[loss=0.1089, beats_loss=0.01269, ecapa_loss=0.0002552, whisper_loss=0.09367, over 19475.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01225, ecapa_loss=0.0002945, whisper_loss=0.09894, over 3906740.85 frames. ], batch size: 77, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:25:58,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=391880.0, ans=0.125 2024-08-10 05:26:14,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 2.916e+01 3.286e+01 3.891e+01 7.167e+01, threshold=6.572e+01, percent-clipped=1.0 2024-08-10 05:26:15,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=391980.0, ans=10.0 2024-08-10 05:26:27,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-10 05:26:37,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-10 05:27:10,657 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 05:27:12,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10250, loss[loss=0.09671, beats_loss=0.01401, ecapa_loss=0.0003318, whisper_loss=0.07938, over 21540.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01226, ecapa_loss=0.000294, whisper_loss=0.09989, over 3920718.05 frames. ], batch size: 89, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:27:13,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.21 vs. limit=22.5 2024-08-10 05:27:44,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=392580.0, ans=0.125 2024-08-10 05:27:52,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=392580.0, ans=0.125 2024-08-10 05:28:16,312 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 05:28:27,503 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10300, loss[loss=0.09272, beats_loss=0.01547, ecapa_loss=0.0002342, whisper_loss=0.07491, over 19849.00 frames. ], tot_loss[loss=0.1149, beats_loss=0.0123, ecapa_loss=0.0002918, whisper_loss=0.09967, over 3930419.86 frames. ], batch size: 81, lr: 1.82e-02, grad_scale: 16777216.0 2024-08-10 05:28:46,141 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+01 3.063e+01 3.413e+01 3.835e+01 1.358e+02, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 05:28:55,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392980.0, ans=0.1 2024-08-10 05:29:19,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=393180.0, ans=0.125 2024-08-10 05:29:35,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=393280.0, ans=0.07 2024-08-10 05:29:42,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=15.0 2024-08-10 05:29:45,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10350, loss[loss=0.1055, beats_loss=0.01323, ecapa_loss=0.0002563, whisper_loss=0.08972, over 20171.00 frames. ], tot_loss[loss=0.1153, beats_loss=0.0123, ecapa_loss=0.000289, whisper_loss=0.1001, over 3939551.11 frames. ], batch size: 81, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:29:54,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=393380.0, ans=0.125 2024-08-10 05:30:01,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2024-08-10 05:30:13,863 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.606e+01 2024-08-10 05:30:24,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=393580.0, ans=0.125 2024-08-10 05:30:39,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=393680.0, ans=0.0 2024-08-10 05:30:46,642 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 05:30:55,655 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 05:31:03,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10400, loss[loss=0.1111, beats_loss=0.01215, ecapa_loss=0.0002926, whisper_loss=0.09602, over 21871.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01227, ecapa_loss=0.0002895, whisper_loss=0.1005, over 3940078.10 frames. ], batch size: 90, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:31:03,284 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 05:31:03,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=393880.0, ans=0.0 2024-08-10 05:31:04,676 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 23 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-10 05:31:09,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=393880.0, ans=0.125 2024-08-10 05:31:12,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393880.0, ans=0.1 2024-08-10 05:31:18,441 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 05:31:21,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.940e+01 3.359e+01 3.808e+01 2.361e+02, threshold=6.718e+01, percent-clipped=2.0 2024-08-10 05:31:21,405 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 05:31:28,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=393980.0, ans=0.0 2024-08-10 05:31:29,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=393980.0, ans=0.125 2024-08-10 05:31:41,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=394080.0, ans=0.125 2024-08-10 05:31:50,024 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=4.582e-01 2024-08-10 05:32:02,101 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 05:32:03,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2024-08-10 05:32:11,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.77 vs. limit=10.0 2024-08-10 05:32:16,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10450, loss[loss=0.09922, beats_loss=0.01187, ecapa_loss=0.000262, whisper_loss=0.08473, over 16214.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01227, ecapa_loss=0.0002875, whisper_loss=0.1006, over 3914173.68 frames. ], batch size: 63, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:32:16,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=394380.0, ans=0.125 2024-08-10 05:32:20,814 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 05:32:22,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=394380.0, ans=0.2 2024-08-10 05:32:28,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=394380.0, ans=0.125 2024-08-10 05:32:41,350 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-10 05:32:41,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-08-10 05:32:41,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2024-08-10 05:32:49,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=394580.0, ans=0.125 2024-08-10 05:33:00,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=394680.0, ans=0.0 2024-08-10 05:33:14,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2024-08-10 05:33:22,702 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 05:33:27,152 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 05:33:29,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2024-08-10 05:33:31,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10500, loss[loss=0.1129, beats_loss=0.01221, ecapa_loss=0.0003187, whisper_loss=0.09751, over 16039.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01238, ecapa_loss=0.0002891, whisper_loss=0.09947, over 3892841.35 frames. ], batch size: 66, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:33:40,713 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 05:33:44,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394880.0, ans=0.1 2024-08-10 05:33:44,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-08-10 05:33:48,268 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 05:33:49,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.533e+01 2.971e+01 3.381e+01 3.721e+01 5.999e+01, threshold=6.761e+01, percent-clipped=0.0 2024-08-10 05:33:52,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-10 05:34:00,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=395080.0, ans=0.0 2024-08-10 05:34:03,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=395080.0, ans=0.2 2024-08-10 05:34:17,978 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 05:34:19,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=395180.0, ans=0.125 2024-08-10 05:34:38,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395280.0, ans=0.1 2024-08-10 05:34:46,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.76 vs. limit=5.0 2024-08-10 05:34:46,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10550, loss[loss=0.08789, beats_loss=0.01335, ecapa_loss=0.0002544, whisper_loss=0.07199, over 14223.00 frames. ], tot_loss[loss=0.1147, beats_loss=0.01225, ecapa_loss=0.0002925, whisper_loss=0.09955, over 3865837.73 frames. ], batch size: 55, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:34:50,133 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 05:35:06,138 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 05:35:34,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.44 vs. limit=15.0 2024-08-10 05:35:35,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=395680.0, ans=0.025 2024-08-10 05:35:50,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=395780.0, ans=0.2 2024-08-10 05:36:02,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10600, loss[loss=0.1045, beats_loss=0.01363, ecapa_loss=0.0002998, whisper_loss=0.0879, over 22840.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01227, ecapa_loss=0.0002926, whisper_loss=0.09873, over 3846176.95 frames. ], batch size: 93, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:36:19,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.978e+01 3.470e+01 3.932e+01 9.831e+01, threshold=6.940e+01, percent-clipped=1.0 2024-08-10 05:36:20,145 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 05:36:30,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=11.81 vs. limit=10.0 2024-08-10 05:36:34,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=396080.0, ans=0.09899494936611666 2024-08-10 05:36:43,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=396080.0, ans=0.0 2024-08-10 05:37:00,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=396180.0, ans=0.125 2024-08-10 05:37:02,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=396280.0, ans=0.125 2024-08-10 05:37:08,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.73 vs. limit=10.0 2024-08-10 05:37:12,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2024-08-10 05:37:17,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10650, loss[loss=0.1318, beats_loss=0.006637, ecapa_loss=0.0003718, whisper_loss=0.1214, over 15081.00 frames. ], tot_loss[loss=0.115, beats_loss=0.0122, ecapa_loss=0.0002893, whisper_loss=0.09993, over 3842023.54 frames. ], batch size: 58, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:37:31,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=396480.0, ans=0.125 2024-08-10 05:37:33,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=396480.0, ans=0.0 2024-08-10 05:37:34,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=396480.0, ans=0.125 2024-08-10 05:37:34,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396480.0, ans=0.1 2024-08-10 05:37:34,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=396480.0, ans=0.0 2024-08-10 05:37:49,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=396580.0, ans=0.125 2024-08-10 05:37:51,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=396580.0, ans=0.125 2024-08-10 05:38:05,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2024-08-10 05:38:15,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=396680.0, ans=0.125 2024-08-10 05:38:17,580 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 05:38:19,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=396780.0, ans=0.125 2024-08-10 05:38:19,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2024-08-10 05:38:22,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=396780.0, ans=0.125 2024-08-10 05:38:29,824 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 05:38:32,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10700, loss[loss=0.0858, beats_loss=0.01266, ecapa_loss=0.0002309, whisper_loss=0.07082, over 17684.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01223, ecapa_loss=0.000287, whisper_loss=0.1003, over 3886727.90 frames. ], batch size: 71, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:38:39,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396880.0, ans=0.1 2024-08-10 05:38:44,134 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 05:38:49,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.532e+01 3.168e+01 3.517e+01 4.154e+01 8.442e+01, threshold=7.034e+01, percent-clipped=1.0 2024-08-10 05:38:53,100 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 05:39:38,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-10 05:39:47,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10750, loss[loss=0.1066, beats_loss=0.01367, ecapa_loss=0.000307, whisper_loss=0.08984, over 20007.00 frames. ], tot_loss[loss=0.1155, beats_loss=0.01224, ecapa_loss=0.0002886, whisper_loss=0.1004, over 3908814.65 frames. ], batch size: 83, lr: 1.81e-02, grad_scale: 16777216.0 2024-08-10 05:39:50,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=397380.0, ans=0.125 2024-08-10 05:40:02,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=397480.0, ans=0.125 2024-08-10 05:40:24,081 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 05:40:26,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397580.0, ans=0.1 2024-08-10 05:41:02,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10800, loss[loss=0.134, beats_loss=0.01141, ecapa_loss=0.0002673, whisper_loss=0.1199, over 23626.00 frames. ], tot_loss[loss=0.1158, beats_loss=0.01222, ecapa_loss=0.0002859, whisper_loss=0.1008, over 3939756.27 frames. ], batch size: 92, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:41:20,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+01 2.911e+01 3.259e+01 3.950e+01 6.115e+01, threshold=6.518e+01, percent-clipped=0.0 2024-08-10 05:41:23,315 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 05:41:23,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397980.0, ans=0.1 2024-08-10 05:41:25,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=397980.0, ans=0.0 2024-08-10 05:41:30,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-10 05:42:18,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10850, loss[loss=0.1252, beats_loss=0.01327, ecapa_loss=0.0002752, whisper_loss=0.1091, over 23064.00 frames. ], tot_loss[loss=0.1157, beats_loss=0.01221, ecapa_loss=0.0002854, whisper_loss=0.1006, over 3900291.96 frames. ], batch size: 94, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:42:23,575 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 05:42:29,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=398380.0, ans=0.125 2024-08-10 05:42:44,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.54 vs. limit=22.5 2024-08-10 05:43:00,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.30 vs. limit=10.0 2024-08-10 05:43:01,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=398580.0, ans=0.0 2024-08-10 05:43:35,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10900, loss[loss=0.08916, beats_loss=0.01586, ecapa_loss=0.0002148, whisper_loss=0.07115, over 18179.00 frames. ], tot_loss[loss=0.1159, beats_loss=0.01223, ecapa_loss=0.0002859, whisper_loss=0.1008, over 3899947.58 frames. ], batch size: 71, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:43:45,416 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 05:43:50,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=398980.0, ans=0.125 2024-08-10 05:43:53,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.145e+01 3.517e+01 3.996e+01 1.577e+02, threshold=7.034e+01, percent-clipped=2.0 2024-08-10 05:43:56,005 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 05:44:01,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=398980.0, ans=0.125 2024-08-10 05:44:04,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=399080.0, ans=0.125 2024-08-10 05:44:17,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-10 05:44:20,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=399180.0, ans=0.125 2024-08-10 05:44:33,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=399180.0, ans=0.2 2024-08-10 05:44:49,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 10950, loss[loss=0.1109, beats_loss=0.008531, ecapa_loss=0.0003147, whisper_loss=0.09923, over 16695.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01228, ecapa_loss=0.0002851, whisper_loss=0.1005, over 3892644.98 frames. ], batch size: 68, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:45:02,183 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 05:45:05,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=399480.0, ans=0.5 2024-08-10 05:45:11,129 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 05:45:11,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399480.0, ans=0.1 2024-08-10 05:45:16,962 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 05:45:53,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-08-10 05:45:58,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=399780.0, ans=0.125 2024-08-10 05:46:05,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11000, loss[loss=0.1056, beats_loss=0.01266, ecapa_loss=0.0003611, whisper_loss=0.08936, over 22080.00 frames. ], tot_loss[loss=0.1151, beats_loss=0.01224, ecapa_loss=0.0002888, whisper_loss=0.1, over 3910153.69 frames. ], batch size: 96, lr: 1.80e-02, grad_scale: 16777216.0 2024-08-10 05:46:07,804 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 05:46:22,708 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-40000.pt 2024-08-10 05:46:26,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.898e+01 3.404e+01 3.976e+01 6.521e+01, threshold=6.808e+01, percent-clipped=0.0 2024-08-10 05:46:46,253 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 05:46:51,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-08-10 05:46:52,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-10 05:47:08,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=400280.0, ans=0.125 2024-08-10 05:47:21,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400380.0, ans=0.1 2024-08-10 05:47:21,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11050, loss[loss=0.1062, beats_loss=0.01375, ecapa_loss=0.0002146, whisper_loss=0.09028, over 15119.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01214, ecapa_loss=0.0002903, whisper_loss=0.1004, over 3898367.61 frames. ], batch size: 57, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:47:26,642 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.054e-02 2024-08-10 05:47:29,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400380.0, ans=0.1 2024-08-10 05:47:31,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=400380.0, ans=0.125 2024-08-10 05:47:48,717 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-10 05:48:08,340 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 05:48:17,991 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 05:48:25,903 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 25 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 05:48:28,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=400780.0, ans=0.04949747468305833 2024-08-10 05:48:34,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11100, loss[loss=0.1203, beats_loss=0.01079, ecapa_loss=0.0002669, whisper_loss=0.1069, over 18927.00 frames. ], tot_loss[loss=0.1152, beats_loss=0.01224, ecapa_loss=0.0002877, whisper_loss=0.1, over 3896138.54 frames. ], batch size: 71, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:48:52,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.998e+01 3.322e+01 3.680e+01 7.626e+01, threshold=6.644e+01, percent-clipped=1.0 2024-08-10 05:48:53,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400980.0, ans=0.1 2024-08-10 05:48:59,645 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 05:49:05,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=401080.0, ans=0.125 2024-08-10 05:49:07,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=401080.0, ans=0.125 2024-08-10 05:49:11,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=401080.0, ans=0.2 2024-08-10 05:49:11,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-10 05:49:41,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.81 vs. limit=10.0 2024-08-10 05:49:42,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=401280.0, ans=0.125 2024-08-10 05:49:44,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401280.0, ans=0.1 2024-08-10 05:49:50,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11150, loss[loss=0.1074, beats_loss=0.01262, ecapa_loss=0.0003084, whisper_loss=0.09172, over 18547.00 frames. ], tot_loss[loss=0.1154, beats_loss=0.01218, ecapa_loss=0.0002855, whisper_loss=0.1004, over 3896692.24 frames. ], batch size: 73, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:50:05,529 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-10 05:50:33,155 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 05:50:45,363 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 33 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-10 05:50:53,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=401780.0, ans=0.125 2024-08-10 05:51:01,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11200, loss[loss=0.1059, beats_loss=0.01131, ecapa_loss=0.0003332, whisper_loss=0.09127, over 13991.00 frames. ], tot_loss[loss=0.1162, beats_loss=0.01209, ecapa_loss=0.0002872, whisper_loss=0.1012, over 3900286.10 frames. ], batch size: 55, lr: 1.80e-02, grad_scale: 33554432.0 2024-08-10 05:51:17,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=401980.0, ans=0.125 2024-08-10 05:51:18,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.098e+01 3.521e+01 4.109e+01 7.831e+01, threshold=7.041e+01, percent-clipped=1.0 2024-08-10 05:51:37,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=15.0 2024-08-10 05:51:54,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=402180.0, ans=0.1 2024-08-10 05:51:55,408 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 05:52:16,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11250, loss[loss=0.1083, beats_loss=0.01021, ecapa_loss=0.0003286, whisper_loss=0.09482, over 14213.00 frames. ], tot_loss[loss=0.1156, beats_loss=0.01215, ecapa_loss=0.0002888, whisper_loss=0.1006, over 3878379.85 frames. ], batch size: 54, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:52:20,594 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 05:52:24,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.87 vs. limit=15.0 2024-08-10 05:52:29,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=402480.0, ans=0.04949747468305833 2024-08-10 05:52:31,344 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 05:52:53,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=402580.0, ans=0.125 2024-08-10 05:52:54,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=402580.0, ans=0.0 2024-08-10 05:53:12,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=402780.0, ans=0.125 2024-08-10 05:53:13,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402780.0, ans=0.125 2024-08-10 05:53:22,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=402780.0, ans=0.5 2024-08-10 05:53:27,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11300, loss[loss=0.1169, beats_loss=0.013, ecapa_loss=0.000253, whisper_loss=0.1013, over 14989.00 frames. ], tot_loss[loss=0.1148, beats_loss=0.01216, ecapa_loss=0.0002855, whisper_loss=0.09979, over 3895341.33 frames. ], batch size: 58, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:53:27,849 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 05:53:32,480 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 05:53:44,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.975e+01 3.482e+01 3.976e+01 1.269e+02, threshold=6.963e+01, percent-clipped=1.0 2024-08-10 05:53:46,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=402980.0, ans=0.0 2024-08-10 05:53:47,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=402980.0, ans=0.125 2024-08-10 05:54:00,502 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 05:54:02,952 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 05:54:15,477 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 05:54:17,048 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 05:54:25,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=403280.0, ans=0.125 2024-08-10 05:54:39,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11350, loss[loss=0.1139, beats_loss=0.01113, ecapa_loss=0.0002924, whisper_loss=0.09987, over 23332.00 frames. ], tot_loss[loss=0.1142, beats_loss=0.01221, ecapa_loss=0.0002852, whisper_loss=0.09919, over 3897671.84 frames. ], batch size: 93, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:54:44,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-10 05:54:51,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=403380.0, ans=0.2 2024-08-10 05:55:10,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=403580.0, ans=0.125 2024-08-10 05:55:10,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=403580.0, ans=0.0 2024-08-10 05:55:10,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-08-10 05:55:48,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=403780.0, ans=0.0 2024-08-10 05:55:55,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11400, loss[loss=0.1118, beats_loss=0.01419, ecapa_loss=0.0002537, whisper_loss=0.09506, over 22828.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01225, ecapa_loss=0.0002863, whisper_loss=0.09887, over 3856063.27 frames. ], batch size: 91, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:56:13,377 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.448e+01 3.091e+01 3.465e+01 3.981e+01 8.996e+01, threshold=6.931e+01, percent-clipped=1.0 2024-08-10 05:56:16,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2024-08-10 05:56:48,721 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 05:56:55,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=404280.0, ans=0.5 2024-08-10 05:57:08,974 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11450, loss[loss=0.1226, beats_loss=0.0114, ecapa_loss=0.0002964, whisper_loss=0.1082, over 21795.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01231, ecapa_loss=0.0002869, whisper_loss=0.09858, over 3862298.48 frames. ], batch size: 88, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:57:11,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=404380.0, ans=0.125 2024-08-10 05:57:16,543 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 15 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 05:57:27,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=404480.0, ans=0.125 2024-08-10 05:57:32,138 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 34 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 05:57:33,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404480.0, ans=0.1 2024-08-10 05:57:36,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-10 05:57:40,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=6.0 2024-08-10 05:58:01,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=404680.0, ans=0.125 2024-08-10 05:58:16,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=404780.0, ans=0.2 2024-08-10 05:58:24,781 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11500, loss[loss=0.09141, beats_loss=0.01408, ecapa_loss=0.0002564, whisper_loss=0.07476, over 15748.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01235, ecapa_loss=0.0002883, whisper_loss=0.09868, over 3870479.73 frames. ], batch size: 67, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:58:25,037 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 05:58:42,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.645e+01 3.195e+01 3.620e+01 4.078e+01 2.789e+02, threshold=7.241e+01, percent-clipped=1.0 2024-08-10 05:58:59,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2024-08-10 05:59:11,000 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 05:59:11,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=405180.0, ans=0.0 2024-08-10 05:59:11,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.59 vs. limit=10.0 2024-08-10 05:59:14,059 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 05:59:18,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=405180.0, ans=0.0 2024-08-10 05:59:26,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=405280.0, ans=0.125 2024-08-10 05:59:38,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11550, loss[loss=0.1032, beats_loss=0.01452, ecapa_loss=0.0002811, whisper_loss=0.08592, over 20399.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01226, ecapa_loss=0.000291, whisper_loss=0.09885, over 3827243.43 frames. ], batch size: 84, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 05:59:42,752 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 05:59:53,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=1.87 vs. limit=15.0 2024-08-10 06:00:07,267 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 06:00:13,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=405580.0, ans=0.125 2024-08-10 06:00:41,337 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 06:00:54,297 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11600, loss[loss=0.1281, beats_loss=0.01196, ecapa_loss=0.0002833, whisper_loss=0.1133, over 22991.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01232, ecapa_loss=0.0002909, whisper_loss=0.09828, over 3868428.16 frames. ], batch size: 91, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:01:02,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=405880.0, ans=0.125 2024-08-10 06:01:07,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=405980.0, ans=0.125 2024-08-10 06:01:08,783 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 06:01:11,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.511e+01 3.361e+01 3.673e+01 4.425e+01 6.331e+01, threshold=7.346e+01, percent-clipped=0.0 2024-08-10 06:01:12,144 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 06:01:15,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=405980.0, ans=0.125 2024-08-10 06:01:31,265 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 06:01:55,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=406280.0, ans=0.125 2024-08-10 06:01:56,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=406280.0, ans=0.0 2024-08-10 06:02:06,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11650, loss[loss=0.1207, beats_loss=0.01361, ecapa_loss=0.0002124, whisper_loss=0.105, over 22883.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01234, ecapa_loss=0.0002898, whisper_loss=0.09829, over 3901643.15 frames. ], batch size: 88, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:02:12,462 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 06:02:36,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=406580.0, ans=0.125 2024-08-10 06:02:43,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=406580.0, ans=15.0 2024-08-10 06:02:46,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=406580.0, ans=0.0 2024-08-10 06:03:00,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=406680.0, ans=0.125 2024-08-10 06:03:11,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406780.0, ans=0.1 2024-08-10 06:03:12,813 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 06:03:16,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11700, loss[loss=0.08912, beats_loss=0.01671, ecapa_loss=0.0002298, whisper_loss=0.07011, over 15144.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01237, ecapa_loss=0.0002878, whisper_loss=0.09859, over 3919365.20 frames. ], batch size: 61, lr: 1.79e-02, grad_scale: 33554432.0 2024-08-10 06:03:25,619 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 36 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 06:03:31,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=406980.0, ans=0.125 2024-08-10 06:03:33,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.451e+01 3.237e+01 3.576e+01 4.266e+01 6.520e+01, threshold=7.151e+01, percent-clipped=0.0 2024-08-10 06:03:51,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=407080.0, ans=0.025 2024-08-10 06:03:54,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=407080.0, ans=0.025 2024-08-10 06:04:20,508 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-10 06:04:23,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=407280.0, ans=0.0 2024-08-10 06:04:26,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=407280.0, ans=0.0 2024-08-10 06:04:28,047 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-10 06:04:28,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11750, loss[loss=0.1308, beats_loss=0.01371, ecapa_loss=0.0002326, whisper_loss=0.1147, over 23252.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01242, ecapa_loss=0.0002881, whisper_loss=0.09839, over 3934520.54 frames. ], batch size: 91, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:04:45,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2024-08-10 06:04:47,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=407480.0, ans=0.09899494936611666 2024-08-10 06:04:58,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-10 06:05:10,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407580.0, ans=0.1 2024-08-10 06:05:31,918 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 06:05:40,178 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 06:05:43,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11800, loss[loss=0.1178, beats_loss=0.01263, ecapa_loss=0.0002839, whisper_loss=0.1024, over 22222.00 frames. ], tot_loss[loss=0.1139, beats_loss=0.01237, ecapa_loss=0.0002872, whisper_loss=0.09865, over 3941594.07 frames. ], batch size: 86, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:05:53,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=407880.0, ans=0.0 2024-08-10 06:05:59,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+01 3.074e+01 3.455e+01 3.897e+01 7.543e+01, threshold=6.910e+01, percent-clipped=1.0 2024-08-10 06:06:02,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=407980.0, ans=0.0 2024-08-10 06:06:17,676 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 06:06:21,849 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 06:06:46,315 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 06:06:48,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-10 06:06:53,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11850, loss[loss=0.09745, beats_loss=0.01116, ecapa_loss=0.0003085, whisper_loss=0.08321, over 16909.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01241, ecapa_loss=0.0002869, whisper_loss=0.09822, over 3951773.68 frames. ], batch size: 69, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:07:00,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-10 06:07:12,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=408480.0, ans=0.125 2024-08-10 06:07:35,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408680.0, ans=0.1 2024-08-10 06:07:39,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=408680.0, ans=0.0 2024-08-10 06:08:03,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11900, loss[loss=0.08117, beats_loss=0.01849, ecapa_loss=0.0001947, whisper_loss=0.06074, over 17316.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01242, ecapa_loss=0.0002869, whisper_loss=0.09845, over 3936086.19 frames. ], batch size: 70, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:08:09,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=408880.0, ans=0.0 2024-08-10 06:08:11,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=408880.0, ans=0.0 2024-08-10 06:08:11,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.91 vs. limit=10.0 2024-08-10 06:08:16,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=408980.0, ans=0.0 2024-08-10 06:08:20,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.628e+01 3.266e+01 3.553e+01 4.247e+01 1.215e+02, threshold=7.106e+01, percent-clipped=1.0 2024-08-10 06:08:22,166 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 06:08:26,159 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 06:08:40,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=409080.0, ans=0.0 2024-08-10 06:08:42,928 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 06:08:44,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=409180.0, ans=0.0 2024-08-10 06:09:12,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409380.0, ans=0.1 2024-08-10 06:09:13,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 11950, loss[loss=0.1229, beats_loss=0.01174, ecapa_loss=0.0002878, whisper_loss=0.1082, over 16399.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01238, ecapa_loss=0.00029, whisper_loss=0.09815, over 3919104.08 frames. ], batch size: 66, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:09:26,184 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 06:09:33,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=409480.0, ans=0.2 2024-08-10 06:09:47,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-08-10 06:09:54,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-10 06:09:55,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=409680.0, ans=0.125 2024-08-10 06:09:56,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=409680.0, ans=0.0 2024-08-10 06:10:03,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.45 vs. limit=22.5 2024-08-10 06:10:14,986 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 06:10:15,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=409780.0, ans=0.02 2024-08-10 06:10:19,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=409780.0, ans=0.125 2024-08-10 06:10:20,102 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 06:10:22,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12000, loss[loss=0.1336, beats_loss=0.01236, ecapa_loss=0.0002384, whisper_loss=0.1189, over 23941.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01243, ecapa_loss=0.0002885, whisper_loss=0.09762, over 3899009.07 frames. ], batch size: 89, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:10:22,755 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 06:11:01,944 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on ASR_libri: loss=0.2695, beats_loss=0, ecapa_loss=0.000863, whisper_loss=0.2608, over 922467.00 frames. 2024-08-10 06:11:16,869 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.8011, 1.2721, 1.5108, 0.9902, 1.8563, 1.6036, 1.6059, 1.5265], device='cuda:0') 2024-08-10 06:11:17,789 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on SV_voxceleb1: loss=0.007635, beats_loss=0, ecapa_loss=0.0007635, whisper_loss=0, over 939242.00 frames. 2024-08-10 06:13:11,075 INFO [train_multi_KD3.py:1149] (0/4) Epoch 3, validation on AT_audioset: loss=0.0284, beats_loss=0.0284, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 06:13:11,080 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 06:13:28,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.155e+01 3.494e+01 4.116e+01 7.765e+01, threshold=6.989e+01, percent-clipped=1.0 2024-08-10 06:13:34,294 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 06:13:40,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=410080.0, ans=0.125 2024-08-10 06:13:40,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=410080.0, ans=0.0 2024-08-10 06:13:51,178 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 06:14:09,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.57 vs. limit=10.0 2024-08-10 06:14:23,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12050, loss[loss=0.1423, beats_loss=0.01222, ecapa_loss=0.0003016, whisper_loss=0.1271, over 17739.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01236, ecapa_loss=0.000288, whisper_loss=0.09802, over 3889698.73 frames. ], batch size: 68, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:14:24,837 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 06:14:25,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=410380.0, ans=0.125 2024-08-10 06:14:33,091 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 06:15:00,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-10 06:15:07,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=410680.0, ans=0.0 2024-08-10 06:15:23,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=410780.0, ans=0.125 2024-08-10 06:15:26,338 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 06:15:33,089 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12100, loss[loss=0.1158, beats_loss=0.01169, ecapa_loss=0.0002391, whisper_loss=0.1017, over 16900.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01233, ecapa_loss=0.0002905, whisper_loss=0.09818, over 3893670.41 frames. ], batch size: 63, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:15:49,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.418e+01 3.160e+01 3.535e+01 4.240e+01 9.123e+01, threshold=7.071e+01, percent-clipped=3.0 2024-08-10 06:15:50,992 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 06:15:59,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=411080.0, ans=0.125 2024-08-10 06:16:10,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=411080.0, ans=0.125 2024-08-10 06:16:12,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=411080.0, ans=0.125 2024-08-10 06:16:24,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-10 06:16:34,982 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 06:16:35,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=31.59 vs. limit=22.5 2024-08-10 06:16:41,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12150, loss[loss=0.09667, beats_loss=0.0144, ecapa_loss=0.0002576, whisper_loss=0.07969, over 22885.00 frames. ], tot_loss[loss=0.113, beats_loss=0.0124, ecapa_loss=0.0002911, whisper_loss=0.09768, over 3860753.69 frames. ], batch size: 92, lr: 1.78e-02, grad_scale: 33554432.0 2024-08-10 06:16:42,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=411380.0, ans=0.125 2024-08-10 06:16:47,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=411380.0, ans=0.125 2024-08-10 06:16:50,271 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-10 06:16:57,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=411480.0, ans=0.125 2024-08-10 06:17:00,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=411480.0, ans=0.0 2024-08-10 06:17:06,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=411480.0, ans=0.125 2024-08-10 06:17:08,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=411580.0, ans=0.125 2024-08-10 06:17:12,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=411580.0, ans=0.2 2024-08-10 06:17:16,392 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 33 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 06:17:20,707 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 06:17:42,934 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 06:17:47,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2024-08-10 06:17:48,401 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 06:17:48,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=411780.0, ans=0.125 2024-08-10 06:17:50,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12200, loss[loss=0.1322, beats_loss=0.008941, ecapa_loss=0.0003082, whisper_loss=0.1202, over 23007.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01237, ecapa_loss=0.0002888, whisper_loss=0.09803, over 3855456.46 frames. ], batch size: 89, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:17:54,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2024-08-10 06:17:56,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=411880.0, ans=0.0 2024-08-10 06:18:07,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.664e+01 3.192e+01 3.663e+01 4.187e+01 6.724e+01, threshold=7.326e+01, percent-clipped=0.0 2024-08-10 06:18:08,230 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 06:18:25,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.87 vs. limit=10.0 2024-08-10 06:18:26,683 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.684e-01 2024-08-10 06:18:29,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=412080.0, ans=0.125 2024-08-10 06:18:39,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=412180.0, ans=0.0 2024-08-10 06:19:00,333 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 06:19:03,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12250, loss[loss=0.1053, beats_loss=0.01279, ecapa_loss=0.0002874, whisper_loss=0.08961, over 20576.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01241, ecapa_loss=0.0002898, whisper_loss=0.09767, over 3852796.35 frames. ], batch size: 83, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:19:05,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=15.0 2024-08-10 06:19:26,134 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 06:19:41,020 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 06:19:52,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=412680.0, ans=0.2 2024-08-10 06:20:05,879 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 06:20:12,508 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12300, loss[loss=0.08441, beats_loss=0.01532, ecapa_loss=0.0003027, whisper_loss=0.06607, over 14255.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01236, ecapa_loss=0.0002902, whisper_loss=0.09783, over 3880874.57 frames. ], batch size: 63, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:20:20,858 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 06:20:28,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.503e+01 3.356e+01 3.807e+01 4.575e+01 1.219e+02, threshold=7.614e+01, percent-clipped=2.0 2024-08-10 06:20:43,151 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 06:20:44,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2024-08-10 06:20:49,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.40 vs. limit=15.0 2024-08-10 06:20:51,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=12.0 2024-08-10 06:20:54,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-10 06:21:01,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=37.39 vs. limit=22.5 2024-08-10 06:21:21,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12350, loss[loss=0.1097, beats_loss=0.01217, ecapa_loss=0.0002981, whisper_loss=0.09451, over 22010.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01242, ecapa_loss=0.0002899, whisper_loss=0.0974, over 3889763.13 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:21:26,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=413380.0, ans=0.125 2024-08-10 06:21:34,436 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 06:21:46,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=413480.0, ans=0.125 2024-08-10 06:21:55,863 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-10 06:22:01,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2024-08-10 06:22:30,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12400, loss[loss=0.1082, beats_loss=0.01412, ecapa_loss=0.0002388, whisper_loss=0.09172, over 21879.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01239, ecapa_loss=0.000287, whisper_loss=0.0979, over 3900192.16 frames. ], batch size: 88, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:22:34,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.25 vs. limit=10.0 2024-08-10 06:22:36,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=413880.0, ans=0.2 2024-08-10 06:22:47,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.307e+01 3.123e+01 3.503e+01 4.019e+01 1.294e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 06:23:39,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12450, loss[loss=0.1222, beats_loss=0.01253, ecapa_loss=0.0003126, whisper_loss=0.1066, over 19017.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01236, ecapa_loss=0.0002888, whisper_loss=0.09826, over 3896051.86 frames. ], batch size: 78, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:23:53,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.75 vs. limit=22.5 2024-08-10 06:23:58,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=414480.0, ans=0.0 2024-08-10 06:24:13,360 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 06:24:19,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=414580.0, ans=0.0 2024-08-10 06:24:32,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=414680.0, ans=0.2 2024-08-10 06:24:48,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-10 06:24:49,275 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12500, loss[loss=0.111, beats_loss=0.0112, ecapa_loss=0.0003376, whisper_loss=0.09638, over 22147.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01226, ecapa_loss=0.000289, whisper_loss=0.09862, over 3864205.75 frames. ], batch size: 92, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:24:52,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=414880.0, ans=0.0 2024-08-10 06:24:56,421 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 06:25:06,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.398e+01 3.279e+01 3.697e+01 4.212e+01 5.815e+01, threshold=7.393e+01, percent-clipped=0.0 2024-08-10 06:25:16,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=415080.0, ans=0.0 2024-08-10 06:25:27,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415080.0, ans=0.1 2024-08-10 06:25:30,434 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 06:25:33,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=415180.0, ans=0.04949747468305833 2024-08-10 06:25:34,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2024-08-10 06:25:50,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=415280.0, ans=0.0 2024-08-10 06:26:01,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12550, loss[loss=0.1096, beats_loss=0.01078, ecapa_loss=0.0002427, whisper_loss=0.09639, over 14728.00 frames. ], tot_loss[loss=0.1138, beats_loss=0.01229, ecapa_loss=0.000288, whisper_loss=0.09866, over 3892163.56 frames. ], batch size: 56, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:26:05,091 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.943e+05 2024-08-10 06:26:12,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=415380.0, ans=0.125 2024-08-10 06:26:28,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=415480.0, ans=0.1 2024-08-10 06:26:39,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=415580.0, ans=0.2 2024-08-10 06:26:48,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415680.0, ans=0.0 2024-08-10 06:27:02,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=415780.0, ans=0.125 2024-08-10 06:27:14,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12600, loss[loss=0.1045, beats_loss=0.01236, ecapa_loss=0.0003023, whisper_loss=0.0891, over 15204.00 frames. ], tot_loss[loss=0.114, beats_loss=0.01229, ecapa_loss=0.0002871, whisper_loss=0.09885, over 3878181.23 frames. ], batch size: 60, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:27:35,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.630e+01 3.161e+01 3.514e+01 4.071e+01 6.890e+01, threshold=7.028e+01, percent-clipped=0.0 2024-08-10 06:27:41,762 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 06:27:42,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=415980.0, ans=0.025 2024-08-10 06:27:43,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=415980.0, ans=0.0 2024-08-10 06:27:45,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415980.0, ans=0.1 2024-08-10 06:27:48,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=416080.0, ans=0.125 2024-08-10 06:27:58,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2024-08-10 06:28:10,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-08-10 06:28:31,879 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 06:28:38,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12650, loss[loss=0.1373, beats_loss=0.01181, ecapa_loss=0.0003133, whisper_loss=0.1224, over 18615.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.0124, ecapa_loss=0.0002882, whisper_loss=0.09786, over 3874331.46 frames. ], batch size: 75, lr: 1.77e-02, grad_scale: 33554432.0 2024-08-10 06:28:38,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=416380.0, ans=0.025 2024-08-10 06:28:58,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.38 vs. limit=15.0 2024-08-10 06:29:08,890 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-10 06:29:09,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=416580.0, ans=0.0 2024-08-10 06:29:57,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=416780.0, ans=0.2 2024-08-10 06:30:00,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12700, loss[loss=0.1105, beats_loss=0.01012, ecapa_loss=0.0003421, whisper_loss=0.09699, over 22571.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01236, ecapa_loss=0.0002859, whisper_loss=0.09792, over 3876013.97 frames. ], batch size: 94, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:30:00,263 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 06:30:23,232 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.052e+01 3.376e+01 3.987e+01 6.626e+01, threshold=6.752e+01, percent-clipped=0.0 2024-08-10 06:30:27,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=416980.0, ans=0.125 2024-08-10 06:30:49,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=417080.0, ans=0.2 2024-08-10 06:30:52,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417080.0, ans=0.1 2024-08-10 06:31:10,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=417180.0, ans=0.5 2024-08-10 06:31:25,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-10 06:31:27,359 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 06:31:33,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=417280.0, ans=0.125 2024-08-10 06:31:40,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12750, loss[loss=0.1097, beats_loss=0.01374, ecapa_loss=0.0002874, whisper_loss=0.09304, over 20710.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01228, ecapa_loss=0.0002894, whisper_loss=0.09843, over 3895969.70 frames. ], batch size: 84, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:31:45,820 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 06:31:49,580 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 06:32:01,040 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 06:32:15,269 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 06:32:43,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=417680.0, ans=0.125 2024-08-10 06:32:54,092 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 06:33:19,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=417880.0, ans=0.0 2024-08-10 06:33:20,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12800, loss[loss=0.112, beats_loss=0.01544, ecapa_loss=0.0003537, whisper_loss=0.093, over 18229.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01228, ecapa_loss=0.0002922, whisper_loss=0.09888, over 3895345.03 frames. ], batch size: 74, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:33:24,929 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-10 06:33:25,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417880.0, ans=0.1 2024-08-10 06:33:42,790 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.343e+01 3.114e+01 3.592e+01 4.168e+01 8.043e+01, threshold=7.184e+01, percent-clipped=1.0 2024-08-10 06:33:46,990 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 06:33:53,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=417980.0, ans=0.125 2024-08-10 06:34:18,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=418180.0, ans=0.07 2024-08-10 06:34:21,226 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 06:34:32,640 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 06:34:55,041 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-10 06:34:59,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12850, loss[loss=0.08957, beats_loss=0.0112, ecapa_loss=0.0003165, whisper_loss=0.07521, over 12965.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.01224, ecapa_loss=0.0002915, whisper_loss=0.0989, over 3865463.70 frames. ], batch size: 54, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:35:12,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=418480.0, ans=0.125 2024-08-10 06:35:35,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=418580.0, ans=0.02 2024-08-10 06:36:05,967 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 06:36:09,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12900, loss[loss=0.1068, beats_loss=0.009751, ecapa_loss=0.0003062, whisper_loss=0.09403, over 17113.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01231, ecapa_loss=0.0002908, whisper_loss=0.09809, over 3876097.23 frames. ], batch size: 68, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:36:21,019 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 06:36:26,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.308e+01 3.152e+01 3.621e+01 4.177e+01 6.125e+01, threshold=7.242e+01, percent-clipped=0.0 2024-08-10 06:36:39,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=419080.0, ans=0.125 2024-08-10 06:36:50,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419180.0, ans=0.125 2024-08-10 06:36:57,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=419180.0, ans=0.0 2024-08-10 06:36:59,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=419180.0, ans=0.035 2024-08-10 06:37:13,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-10 06:37:19,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 12950, loss[loss=0.119, beats_loss=0.01065, ecapa_loss=0.0003232, whisper_loss=0.1051, over 15593.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01226, ecapa_loss=0.0002912, whisper_loss=0.09735, over 3847078.77 frames. ], batch size: 62, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:37:28,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=419380.0, ans=0.05 2024-08-10 06:38:04,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=419680.0, ans=0.125 2024-08-10 06:38:05,221 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 06:38:05,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=419680.0, ans=0.2 2024-08-10 06:38:06,514 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 06:38:18,496 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 06:38:23,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-10 06:38:26,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=419780.0, ans=0.125 2024-08-10 06:38:28,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13000, loss[loss=0.1085, beats_loss=0.01303, ecapa_loss=0.0002691, whisper_loss=0.09277, over 18769.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01235, ecapa_loss=0.0002917, whisper_loss=0.09769, over 3863198.25 frames. ], batch size: 76, lr: 1.76e-02, grad_scale: 33554432.0 2024-08-10 06:38:36,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=419880.0, ans=0.125 2024-08-10 06:38:40,278 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 06:38:45,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=419980.0, ans=0.125 2024-08-10 06:38:45,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 3.317e+01 3.869e+01 4.527e+01 7.040e+01, threshold=7.738e+01, percent-clipped=0.0 2024-08-10 06:39:11,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=420180.0, ans=0.125 2024-08-10 06:39:42,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13050, loss[loss=0.0581, beats_loss=0.01772, ecapa_loss=0.0002647, whisper_loss=0.03773, over 17891.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01236, ecapa_loss=0.0002902, whisper_loss=0.09794, over 3847691.43 frames. ], batch size: 77, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:39:49,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=420380.0, ans=0.125 2024-08-10 06:39:52,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420380.0, ans=0.0 2024-08-10 06:39:59,080 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 06:39:59,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=420480.0, ans=0.125 2024-08-10 06:40:00,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=420480.0, ans=0.0 2024-08-10 06:40:17,746 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 06:40:19,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=420580.0, ans=0.125 2024-08-10 06:40:25,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=420680.0, ans=0.04949747468305833 2024-08-10 06:40:29,904 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 06:40:35,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-08-10 06:40:40,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=420780.0, ans=0.125 2024-08-10 06:40:49,959 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 06:40:55,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-08-10 06:40:56,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13100, loss[loss=0.1149, beats_loss=0.01084, ecapa_loss=0.0002768, whisper_loss=0.1013, over 22347.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01246, ecapa_loss=0.0002886, whisper_loss=0.09723, over 3848590.11 frames. ], batch size: 93, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:41:01,875 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 06:41:02,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=420880.0, ans=0.2 2024-08-10 06:41:11,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=420980.0, ans=0.125 2024-08-10 06:41:14,674 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.548e+01 3.107e+01 3.501e+01 3.954e+01 7.732e+01, threshold=7.002e+01, percent-clipped=0.0 2024-08-10 06:41:24,840 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 06:41:29,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=421080.0, ans=0.2 2024-08-10 06:42:06,450 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 06:42:12,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13150, loss[loss=0.1059, beats_loss=0.01302, ecapa_loss=0.0002953, whisper_loss=0.0899, over 21652.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01246, ecapa_loss=0.0002864, whisper_loss=0.09692, over 3824745.18 frames. ], batch size: 93, lr: 1.76e-02, grad_scale: 67108864.0 2024-08-10 06:42:18,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=421380.0, ans=0.2 2024-08-10 06:42:31,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.07 vs. limit=22.5 2024-08-10 06:42:41,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421580.0, ans=0.1 2024-08-10 06:43:25,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13200, loss[loss=0.1049, beats_loss=0.01176, ecapa_loss=0.000283, whisper_loss=0.09029, over 17654.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.0124, ecapa_loss=0.0002857, whisper_loss=0.09739, over 3778612.46 frames. ], batch size: 71, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:43:41,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.53 vs. limit=5.0 2024-08-10 06:43:42,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+01 3.092e+01 3.479e+01 4.168e+01 6.203e+01, threshold=6.958e+01, percent-clipped=0.0 2024-08-10 06:43:48,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=421980.0, ans=0.1 2024-08-10 06:43:48,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-08-10 06:43:54,315 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 06:44:03,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=422080.0, ans=0.2 2024-08-10 06:44:08,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=422080.0, ans=0.0 2024-08-10 06:44:15,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=422180.0, ans=0.125 2024-08-10 06:44:30,822 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 06:44:34,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422280.0, ans=0.125 2024-08-10 06:44:38,993 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 06:44:40,597 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 06:44:41,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13250, loss[loss=0.1208, beats_loss=0.01205, ecapa_loss=0.0002804, whisper_loss=0.106, over 14354.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.0123, ecapa_loss=0.0002871, whisper_loss=0.09798, over 3807473.90 frames. ], batch size: 59, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:44:53,706 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 06:44:54,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2024-08-10 06:44:58,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=422480.0, ans=0.125 2024-08-10 06:45:07,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=1.93 vs. limit=15.0 2024-08-10 06:45:17,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2024-08-10 06:45:18,975 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 06:45:42,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.20 vs. limit=15.0 2024-08-10 06:45:43,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=422780.0, ans=0.125 2024-08-10 06:45:51,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=422780.0, ans=0.0 2024-08-10 06:45:57,469 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13300, loss[loss=0.1438, beats_loss=0.009846, ecapa_loss=0.0003066, whisper_loss=0.1309, over 16535.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01226, ecapa_loss=0.0002882, whisper_loss=0.09753, over 3821129.46 frames. ], batch size: 63, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:45:57,718 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 06:45:58,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=422880.0, ans=0.05 2024-08-10 06:46:00,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422880.0, ans=0.1 2024-08-10 06:46:15,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.602e+01 3.388e+01 3.671e+01 4.200e+01 6.497e+01, threshold=7.342e+01, percent-clipped=0.0 2024-08-10 06:46:20,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=422980.0, ans=0.125 2024-08-10 06:46:22,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=422980.0, ans=0.125 2024-08-10 06:46:31,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=423080.0, ans=0.125 2024-08-10 06:46:33,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2024-08-10 06:46:41,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=423180.0, ans=0.025 2024-08-10 06:46:45,053 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-10 06:46:52,087 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 06:46:53,903 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-10 06:46:55,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=423280.0, ans=0.125 2024-08-10 06:46:58,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=22.5 2024-08-10 06:46:59,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=12.0 2024-08-10 06:47:09,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=423380.0, ans=0.125 2024-08-10 06:47:10,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13350, loss[loss=0.1004, beats_loss=0.01253, ecapa_loss=0.0002385, whisper_loss=0.08549, over 18716.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01228, ecapa_loss=0.0002869, whisper_loss=0.09775, over 3835138.74 frames. ], batch size: 72, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:47:11,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=423380.0, ans=0.125 2024-08-10 06:47:22,977 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 06:47:24,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=423480.0, ans=0.125 2024-08-10 06:47:36,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=423480.0, ans=0.0 2024-08-10 06:47:55,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=22.5 2024-08-10 06:48:24,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13400, loss[loss=0.1162, beats_loss=0.01062, ecapa_loss=0.0002503, whisper_loss=0.1031, over 19068.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01237, ecapa_loss=0.0002864, whisper_loss=0.09743, over 3842845.86 frames. ], batch size: 71, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:48:27,878 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.785e+03 2024-08-10 06:48:36,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423880.0, ans=0.1 2024-08-10 06:48:42,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 3.269e+01 3.722e+01 4.193e+01 5.690e+01, threshold=7.444e+01, percent-clipped=0.0 2024-08-10 06:48:49,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=423980.0, ans=0.0 2024-08-10 06:48:57,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=424080.0, ans=0.2 2024-08-10 06:49:10,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=424180.0, ans=0.125 2024-08-10 06:49:18,828 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 06:49:19,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=424180.0, ans=0.95 2024-08-10 06:49:24,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=424280.0, ans=0.1 2024-08-10 06:49:36,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424280.0, ans=0.1 2024-08-10 06:49:38,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13450, loss[loss=0.1034, beats_loss=0.0117, ecapa_loss=0.0003291, whisper_loss=0.08842, over 21843.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.0125, ecapa_loss=0.0002874, whisper_loss=0.09716, over 3890029.43 frames. ], batch size: 92, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:49:54,677 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 06:49:54,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=424480.0, ans=0.2 2024-08-10 06:49:57,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=424480.0, ans=0.125 2024-08-10 06:50:19,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=424580.0, ans=0.125 2024-08-10 06:50:23,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=424680.0, ans=10.0 2024-08-10 06:50:37,805 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 06:50:44,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=12.0 2024-08-10 06:50:49,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=424880.0, ans=0.125 2024-08-10 06:50:50,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13500, loss[loss=0.1114, beats_loss=0.0129, ecapa_loss=0.0002467, whisper_loss=0.09601, over 20752.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01244, ecapa_loss=0.0002893, whisper_loss=0.09724, over 3909818.83 frames. ], batch size: 81, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:50:54,602 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 06:50:59,108 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 06:50:59,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=424880.0, ans=0.05 2024-08-10 06:51:05,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=424980.0, ans=0.0 2024-08-10 06:51:07,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.618e+01 3.316e+01 3.785e+01 4.530e+01 1.081e+02, threshold=7.570e+01, percent-clipped=1.0 2024-08-10 06:51:09,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=424980.0, ans=0.0 2024-08-10 06:51:14,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424980.0, ans=0.125 2024-08-10 06:51:15,356 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 06:51:39,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=425180.0, ans=0.125 2024-08-10 06:51:44,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=425180.0, ans=0.125 2024-08-10 06:51:44,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2024-08-10 06:52:01,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13550, loss[loss=0.1085, beats_loss=0.0152, ecapa_loss=0.0002668, whisper_loss=0.09062, over 23591.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01245, ecapa_loss=0.0002888, whisper_loss=0.0973, over 3914417.59 frames. ], batch size: 95, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:52:19,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=425480.0, ans=0.02 2024-08-10 06:52:23,244 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 06:52:23,541 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.164e+03 2024-08-10 06:52:29,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=425580.0, ans=0.125 2024-08-10 06:52:34,342 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 06:52:35,776 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 42 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 06:52:35,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=425580.0, ans=0.125 2024-08-10 06:52:49,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-10 06:52:59,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=425780.0, ans=0.125 2024-08-10 06:53:07,654 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 06:53:13,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13600, loss[loss=0.08891, beats_loss=0.01582, ecapa_loss=0.0002688, whisper_loss=0.0704, over 22441.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01232, ecapa_loss=0.0002865, whisper_loss=0.09788, over 3924744.66 frames. ], batch size: 97, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:53:23,975 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 06:53:24,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2024-08-10 06:53:30,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 3.163e+01 3.442e+01 4.144e+01 6.667e+01, threshold=6.884e+01, percent-clipped=0.0 2024-08-10 06:53:32,392 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 06:53:35,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=425980.0, ans=0.125 2024-08-10 06:54:09,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=426280.0, ans=0.0 2024-08-10 06:54:22,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-10 06:54:24,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13650, loss[loss=0.1342, beats_loss=0.01091, ecapa_loss=0.0003413, whisper_loss=0.1199, over 22075.00 frames. ], tot_loss[loss=0.1132, beats_loss=0.01231, ecapa_loss=0.0002868, whisper_loss=0.09804, over 3913818.05 frames. ], batch size: 88, lr: 1.75e-02, grad_scale: 67108864.0 2024-08-10 06:54:32,739 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 06:54:37,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=426480.0, ans=0.07 2024-08-10 06:54:53,551 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 06:55:01,376 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 06:55:04,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=426680.0, ans=0.125 2024-08-10 06:55:11,274 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 06:55:15,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2024-08-10 06:55:19,311 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-10 06:55:19,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=426780.0, ans=0.0 2024-08-10 06:55:33,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13700, loss[loss=0.1175, beats_loss=0.01252, ecapa_loss=0.0002355, whisper_loss=0.1027, over 18716.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01235, ecapa_loss=0.0002896, whisper_loss=0.09786, over 3904828.69 frames. ], batch size: 71, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:55:33,428 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 06:55:35,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.15 vs. limit=10.0 2024-08-10 06:55:46,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=426980.0, ans=0.125 2024-08-10 06:55:49,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+01 3.221e+01 3.630e+01 4.052e+01 7.780e+01, threshold=7.261e+01, percent-clipped=2.0 2024-08-10 06:56:39,434 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 06:56:39,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=427280.0, ans=0.2 2024-08-10 06:56:41,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=427280.0, ans=0.2 2024-08-10 06:56:43,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13750, loss[loss=0.1209, beats_loss=0.01243, ecapa_loss=0.0002632, whisper_loss=0.1058, over 18882.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01233, ecapa_loss=0.0002904, whisper_loss=0.09773, over 3864107.73 frames. ], batch size: 74, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:57:08,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=427480.0, ans=0.0 2024-08-10 06:57:23,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=427580.0, ans=0.125 2024-08-10 06:57:27,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=427680.0, ans=0.125 2024-08-10 06:57:43,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=427780.0, ans=0.125 2024-08-10 06:57:45,054 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 06:57:51,864 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 06:57:53,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13800, loss[loss=0.09895, beats_loss=0.01451, ecapa_loss=0.0002799, whisper_loss=0.08164, over 18009.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01239, ecapa_loss=0.000289, whisper_loss=0.09654, over 3840088.23 frames. ], batch size: 73, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:57:57,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=427880.0, ans=0.0 2024-08-10 06:58:10,194 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+01 3.325e+01 3.732e+01 4.469e+01 6.721e+01, threshold=7.464e+01, percent-clipped=0.0 2024-08-10 06:58:21,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=428080.0, ans=0.125 2024-08-10 06:58:22,832 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 06:58:26,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-10 06:58:35,517 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 29 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-10 06:58:43,688 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 06:59:02,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13850, loss[loss=0.1162, beats_loss=0.01114, ecapa_loss=0.0002874, whisper_loss=0.1021, over 15256.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01241, ecapa_loss=0.0002882, whisper_loss=0.097, over 3873795.17 frames. ], batch size: 57, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 06:59:05,308 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 06:59:16,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=428480.0, ans=0.0 2024-08-10 06:59:25,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=8.0 2024-08-10 06:59:48,653 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 06:59:51,267 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 07:00:05,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-08-10 07:00:09,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13900, loss[loss=0.0956, beats_loss=0.01639, ecapa_loss=0.0002405, whisper_loss=0.0768, over 16509.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01234, ecapa_loss=0.0002895, whisper_loss=0.09736, over 3880269.69 frames. ], batch size: 66, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:00:13,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=428880.0, ans=0.0 2024-08-10 07:00:21,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428880.0, ans=0.1 2024-08-10 07:00:26,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.310e+01 3.794e+01 4.612e+01 1.013e+02, threshold=7.587e+01, percent-clipped=2.0 2024-08-10 07:00:31,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=428980.0, ans=0.125 2024-08-10 07:00:41,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=429080.0, ans=0.0 2024-08-10 07:00:45,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429080.0, ans=0.1 2024-08-10 07:00:47,297 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.385e-01 2024-08-10 07:00:51,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=429180.0, ans=0.125 2024-08-10 07:00:52,492 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 16 from Vox, 53 fro AS 2024-08-10 07:00:54,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429180.0, ans=0.1 2024-08-10 07:00:56,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=429180.0, ans=0.125 2024-08-10 07:01:18,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 13950, loss[loss=0.1425, beats_loss=0.009847, ecapa_loss=0.0003275, whisper_loss=0.1293, over 23005.00 frames. ], tot_loss[loss=0.1136, beats_loss=0.01223, ecapa_loss=0.0002902, whisper_loss=0.09842, over 3893948.11 frames. ], batch size: 91, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:01:19,864 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 07:01:36,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=429480.0, ans=0.0 2024-08-10 07:01:39,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=429480.0, ans=0.125 2024-08-10 07:01:40,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=429480.0, ans=0.125 2024-08-10 07:01:42,060 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 07:02:06,729 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 07:02:11,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=429680.0, ans=0.125 2024-08-10 07:02:22,823 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 07:02:26,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14000, loss[loss=0.09847, beats_loss=0.01384, ecapa_loss=0.0003277, whisper_loss=0.08135, over 21411.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01226, ecapa_loss=0.0002883, whisper_loss=0.09815, over 3897297.75 frames. ], batch size: 90, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:02:42,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=429980.0, ans=0.0 2024-08-10 07:02:43,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 3.368e+01 3.902e+01 4.630e+01 2.044e+02, threshold=7.804e+01, percent-clipped=2.0 2024-08-10 07:03:05,098 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 07:03:16,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=430180.0, ans=0.02 2024-08-10 07:03:25,772 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 07:03:35,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14050, loss[loss=0.1073, beats_loss=0.0119, ecapa_loss=0.0002351, whisper_loss=0.09308, over 17363.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01226, ecapa_loss=0.0002859, whisper_loss=0.09856, over 3867701.33 frames. ], batch size: 65, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:03:42,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=430380.0, ans=0.05 2024-08-10 07:03:46,908 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 07:03:57,921 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 07:04:02,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=430580.0, ans=0.2 2024-08-10 07:04:06,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=430580.0, ans=0.125 2024-08-10 07:04:07,709 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 07:04:18,590 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 07:04:18,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=430680.0, ans=0.125 2024-08-10 07:04:21,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=430680.0, ans=0.125 2024-08-10 07:04:31,176 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 07:04:34,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.29 vs. limit=22.5 2024-08-10 07:04:38,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=430780.0, ans=0.125 2024-08-10 07:04:40,824 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 07:04:42,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=12.0 2024-08-10 07:04:44,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14100, loss[loss=0.1276, beats_loss=0.01272, ecapa_loss=0.0003524, whisper_loss=0.1114, over 21674.00 frames. ], tot_loss[loss=0.1141, beats_loss=0.0123, ecapa_loss=0.0002861, whisper_loss=0.09898, over 3884491.68 frames. ], batch size: 92, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:04:46,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=430880.0, ans=0.09899494936611666 2024-08-10 07:04:50,327 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.650e+01 2024-08-10 07:04:52,588 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-10 07:04:55,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430880.0, ans=0.1 2024-08-10 07:05:00,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.555e+01 3.108e+01 3.411e+01 4.014e+01 7.175e+01, threshold=6.821e+01, percent-clipped=1.0 2024-08-10 07:05:03,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2024-08-10 07:05:32,304 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 07:05:43,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=431280.0, ans=0.125 2024-08-10 07:05:48,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=431280.0, ans=0.2 2024-08-10 07:05:51,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=431380.0, ans=0.0 2024-08-10 07:05:52,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14150, loss[loss=0.117, beats_loss=0.01143, ecapa_loss=0.0003103, whisper_loss=0.1025, over 22571.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01231, ecapa_loss=0.0002853, whisper_loss=0.09922, over 3902862.67 frames. ], batch size: 92, lr: 1.74e-02, grad_scale: 67108864.0 2024-08-10 07:05:52,799 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.153e-01 2024-08-10 07:05:54,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431380.0, ans=0.1 2024-08-10 07:06:02,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.88 vs. limit=22.5 2024-08-10 07:06:03,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=431380.0, ans=0.125 2024-08-10 07:06:14,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.66 vs. limit=22.5 2024-08-10 07:06:24,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431580.0, ans=0.125 2024-08-10 07:06:34,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2024-08-10 07:06:42,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=431680.0, ans=0.125 2024-08-10 07:07:01,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14200, loss[loss=0.1008, beats_loss=0.01164, ecapa_loss=0.0003357, whisper_loss=0.08584, over 21072.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01224, ecapa_loss=0.0002865, whisper_loss=0.09923, over 3902035.80 frames. ], batch size: 90, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:07:16,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2024-08-10 07:07:18,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.671e+01 3.227e+01 3.786e+01 4.277e+01 7.139e+01, threshold=7.572e+01, percent-clipped=1.0 2024-08-10 07:07:19,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.48 vs. limit=10.0 2024-08-10 07:07:38,819 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 07:07:59,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-10 07:08:03,297 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 07:08:06,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=432280.0, ans=0.125 2024-08-10 07:08:11,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14250, loss[loss=0.1365, beats_loss=0.01054, ecapa_loss=0.0002867, whisper_loss=0.1231, over 23631.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01223, ecapa_loss=0.0002862, whisper_loss=0.09954, over 3915631.75 frames. ], batch size: 93, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:08:17,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=432380.0, ans=0.0 2024-08-10 07:08:17,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=432380.0, ans=0.2 2024-08-10 07:08:20,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2024-08-10 07:08:21,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=432380.0, ans=0.0 2024-08-10 07:08:35,521 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 07:08:42,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=432580.0, ans=0.125 2024-08-10 07:08:43,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=432580.0, ans=0.125 2024-08-10 07:08:45,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=432580.0, ans=0.2 2024-08-10 07:09:16,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=432780.0, ans=0.125 2024-08-10 07:09:20,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14300, loss[loss=0.1263, beats_loss=0.01314, ecapa_loss=0.0002366, whisper_loss=0.1108, over 20569.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01236, ecapa_loss=0.0002834, whisper_loss=0.09791, over 3901710.35 frames. ], batch size: 77, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:09:24,796 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.565e+05 2024-08-10 07:09:24,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432880.0, ans=0.1 2024-08-10 07:09:32,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432980.0, ans=0.1 2024-08-10 07:09:34,239 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 17 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 07:09:35,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=432980.0, ans=0.125 2024-08-10 07:09:36,815 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 3.216e+01 3.597e+01 4.195e+01 6.015e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:09:46,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=433080.0, ans=0.2 2024-08-10 07:09:53,328 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 07:09:55,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.83 vs. limit=22.5 2024-08-10 07:09:59,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=433080.0, ans=0.125 2024-08-10 07:10:15,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=433280.0, ans=0.0 2024-08-10 07:10:19,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=433280.0, ans=0.125 2024-08-10 07:10:21,020 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 07:10:23,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=433280.0, ans=0.0 2024-08-10 07:10:28,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14350, loss[loss=0.1302, beats_loss=0.008137, ecapa_loss=0.0003857, whisper_loss=0.1182, over 16293.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.0123, ecapa_loss=0.0002862, whisper_loss=0.0981, over 3905114.36 frames. ], batch size: 67, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:10:34,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433380.0, ans=0.0 2024-08-10 07:10:41,090 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 07:11:02,099 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 07:11:20,809 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 07:11:28,776 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 07:11:35,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=433880.0, ans=0.125 2024-08-10 07:11:36,468 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14400, loss[loss=0.134, beats_loss=0.01012, ecapa_loss=0.0003419, whisper_loss=0.1204, over 23173.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01226, ecapa_loss=0.0002881, whisper_loss=0.0986, over 3910114.39 frames. ], batch size: 93, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:11:48,089 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 07:11:48,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=433880.0, ans=0.2 2024-08-10 07:11:53,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 3.382e+01 3.755e+01 4.286e+01 6.808e+01, threshold=7.511e+01, percent-clipped=0.0 2024-08-10 07:12:10,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=434080.0, ans=0.2 2024-08-10 07:12:45,105 INFO [train_multi_KD3.py:1116] (0/4) Epoch 3, batch 14450, loss[loss=0.1073, beats_loss=0.01544, ecapa_loss=0.0002615, whisper_loss=0.08925, over 14360.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01231, ecapa_loss=0.0002891, whisper_loss=0.09848, over 3919875.26 frames. ], batch size: 55, lr: 1.73e-02, grad_scale: 67108864.0 2024-08-10 07:12:46,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434380.0, ans=0.1 2024-08-10 07:12:48,041 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 07:13:02,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=434480.0, ans=0.1 2024-08-10 07:13:08,867 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 07:13:10,089 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 07:13:15,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=434580.0, ans=0.2 2024-08-10 07:13:23,242 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 07:13:25,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=434680.0, ans=0.0 2024-08-10 07:13:29,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=434680.0, ans=0.125 2024-08-10 07:13:33,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=434680.0, ans=0.125 2024-08-10 07:13:37,571 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-3.pt 2024-08-10 07:14:14,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 0, loss[loss=0.1021, beats_loss=0.01341, ecapa_loss=0.0003449, whisper_loss=0.08528, over 18980.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01341, ecapa_loss=0.0003449, whisper_loss=0.08528, over 18980.00 frames. ], batch size: 77, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:14:14,468 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 07:14:55,832 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on ASR_libri: loss=0.268, beats_loss=0, ecapa_loss=0.0008857, whisper_loss=0.2592, over 922467.00 frames. 2024-08-10 07:15:10,900 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on SV_voxceleb1: loss=0.007801, beats_loss=0, ecapa_loss=0.0007801, whisper_loss=0, over 939242.00 frames. 2024-08-10 07:17:09,537 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on AT_audioset: loss=0.02834, beats_loss=0.02834, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 07:17:09,540 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 07:17:36,658 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 07:17:36,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=434870.0, ans=0.1 2024-08-10 07:17:45,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=434870.0, ans=0.125 2024-08-10 07:18:12,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.318e+01 3.888e+01 4.583e+01 8.270e+01, threshold=7.777e+01, percent-clipped=1.0 2024-08-10 07:18:44,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.69 vs. limit=10.0 2024-08-10 07:19:19,967 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 50, loss[loss=0.1047, beats_loss=0.01336, ecapa_loss=0.0002908, whisper_loss=0.08845, over 21982.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01224, ecapa_loss=0.0002929, whisper_loss=0.09853, over 929191.27 frames. ], batch size: 89, lr: 1.62e-02, grad_scale: 67108864.0 2024-08-10 07:19:42,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=435370.0, ans=0.125 2024-08-10 07:19:47,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=435370.0, ans=0.035 2024-08-10 07:19:57,535 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 07:20:17,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=435470.0, ans=0.125 2024-08-10 07:20:19,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=435470.0, ans=0.0 2024-08-10 07:20:38,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=435570.0, ans=0.0 2024-08-10 07:21:00,175 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 07:21:15,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=435670.0, ans=0.125 2024-08-10 07:21:21,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 100, loss[loss=0.1107, beats_loss=0.01327, ecapa_loss=0.0002968, whisper_loss=0.09445, over 18273.00 frames. ], tot_loss[loss=0.1144, beats_loss=0.01185, ecapa_loss=0.000293, whisper_loss=0.09959, over 1597093.19 frames. ], batch size: 74, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:21:22,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=435770.0, ans=0.125 2024-08-10 07:21:49,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=435870.0, ans=0.025 2024-08-10 07:22:14,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.841e+01 3.372e+01 3.715e+01 4.340e+01 6.479e+01, threshold=7.429e+01, percent-clipped=0.0 2024-08-10 07:22:18,224 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 07:22:18,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=12.0 2024-08-10 07:22:22,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=435970.0, ans=0.125 2024-08-10 07:22:27,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=435970.0, ans=0.125 2024-08-10 07:22:29,933 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 07:22:34,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=436070.0, ans=0.125 2024-08-10 07:23:00,642 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 07:23:14,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 150, loss[loss=0.1353, beats_loss=0.006601, ecapa_loss=0.0004109, whisper_loss=0.1245, over 16462.00 frames. ], tot_loss[loss=0.1146, beats_loss=0.01193, ecapa_loss=0.0002834, whisper_loss=0.09986, over 2086888.14 frames. ], batch size: 67, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:23:26,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=436270.0, ans=0.125 2024-08-10 07:23:59,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=436470.0, ans=0.0 2024-08-10 07:24:37,189 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 07:24:38,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 200, loss[loss=0.1155, beats_loss=0.01197, ecapa_loss=0.0002989, whisper_loss=0.1005, over 22485.00 frames. ], tot_loss[loss=0.1143, beats_loss=0.01193, ecapa_loss=0.0002794, whisper_loss=0.09959, over 2477239.39 frames. ], batch size: 88, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:24:41,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=436770.0, ans=0.125 2024-08-10 07:24:45,896 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 07:24:56,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=436870.0, ans=0.2 2024-08-10 07:24:58,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=436870.0, ans=0.0 2024-08-10 07:25:11,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-10 07:25:14,585 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.684e+01 3.326e+01 3.682e+01 4.488e+01 7.047e+01, threshold=7.364e+01, percent-clipped=0.0 2024-08-10 07:25:40,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=437170.0, ans=0.2 2024-08-10 07:25:40,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=437170.0, ans=0.07 2024-08-10 07:25:57,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 250, loss[loss=0.1169, beats_loss=0.01414, ecapa_loss=0.0002185, whisper_loss=0.1006, over 14141.00 frames. ], tot_loss[loss=0.1137, beats_loss=0.01205, ecapa_loss=0.0002788, whisper_loss=0.09885, over 2751853.69 frames. ], batch size: 54, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:26:07,303 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 07:26:16,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=437370.0, ans=0.125 2024-08-10 07:26:18,501 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 07:26:23,765 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 07:26:25,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.50 vs. limit=10.0 2024-08-10 07:26:26,414 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-10 07:26:27,878 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 07:26:35,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=437470.0, ans=0.0 2024-08-10 07:26:49,410 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 07:26:55,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=437570.0, ans=0.125 2024-08-10 07:26:56,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-08-10 07:27:12,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 300, loss[loss=0.08745, beats_loss=0.01579, ecapa_loss=0.0002801, whisper_loss=0.06886, over 14261.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01214, ecapa_loss=0.0002791, whisper_loss=0.09817, over 2989460.42 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:27:17,595 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 07:27:23,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=437770.0, ans=0.2 2024-08-10 07:27:24,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=437770.0, ans=0.0 2024-08-10 07:27:26,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-10 07:27:42,689 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 07:27:46,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.173e+01 3.597e+01 4.305e+01 6.522e+01, threshold=7.194e+01, percent-clipped=0.0 2024-08-10 07:27:49,814 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 07:28:10,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2024-08-10 07:28:18,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=438170.0, ans=0.1 2024-08-10 07:28:26,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2024-08-10 07:28:27,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 350, loss[loss=0.1196, beats_loss=0.00833, ecapa_loss=0.0003755, whisper_loss=0.1075, over 13239.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01206, ecapa_loss=0.0002778, whisper_loss=0.09791, over 3168223.99 frames. ], batch size: 53, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:28:29,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=438270.0, ans=0.0 2024-08-10 07:28:37,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=438270.0, ans=0.0 2024-08-10 07:28:37,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=438270.0, ans=0.125 2024-08-10 07:28:45,235 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 07:29:29,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2024-08-10 07:29:37,189 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 15 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-10 07:29:42,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 400, loss[loss=0.09978, beats_loss=0.01383, ecapa_loss=0.0002633, whisper_loss=0.08331, over 22367.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01194, ecapa_loss=0.0002788, whisper_loss=0.09758, over 3301966.38 frames. ], batch size: 90, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:29:43,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=438770.0, ans=0.0 2024-08-10 07:29:53,561 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 07:29:58,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=438870.0, ans=0.125 2024-08-10 07:30:05,519 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 07:30:11,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2024-08-10 07:30:16,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+01 3.285e+01 3.710e+01 4.185e+01 8.184e+01, threshold=7.420e+01, percent-clipped=1.0 2024-08-10 07:30:32,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=439070.0, ans=0.125 2024-08-10 07:30:36,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=439070.0, ans=0.125 2024-08-10 07:30:37,785 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 07:30:47,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2024-08-10 07:30:51,604 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.08 vs. limit=15.0 2024-08-10 07:30:54,493 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 450, loss[loss=0.1044, beats_loss=0.01275, ecapa_loss=0.0002924, whisper_loss=0.08871, over 21267.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01204, ecapa_loss=0.0002786, whisper_loss=0.09674, over 3384126.08 frames. ], batch size: 88, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:31:03,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=439270.0, ans=0.0 2024-08-10 07:31:06,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-10 07:31:15,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-10 07:31:16,543 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 07:31:38,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=439570.0, ans=0.2 2024-08-10 07:31:45,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=439570.0, ans=0.125 2024-08-10 07:31:45,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=439570.0, ans=0.125 2024-08-10 07:31:52,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=439670.0, ans=0.125 2024-08-10 07:31:55,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2024-08-10 07:32:00,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 500, loss[loss=0.1268, beats_loss=0.01231, ecapa_loss=0.000281, whisper_loss=0.1117, over 22321.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01206, ecapa_loss=0.0002764, whisper_loss=0.09681, over 3497898.76 frames. ], batch size: 89, lr: 1.61e-02, grad_scale: 67108864.0 2024-08-10 07:32:11,543 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 07:32:29,843 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-44000.pt 2024-08-10 07:32:33,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.388e+01 2.971e+01 3.310e+01 3.858e+01 7.927e+01, threshold=6.621e+01, percent-clipped=1.0 2024-08-10 07:32:40,842 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 07:32:42,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=440070.0, ans=0.035 2024-08-10 07:32:51,324 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 07:32:56,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=15.0 2024-08-10 07:32:57,872 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 07:33:05,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2024-08-10 07:33:09,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 550, loss[loss=0.1085, beats_loss=0.01292, ecapa_loss=0.0002859, whisper_loss=0.09267, over 15641.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01196, ecapa_loss=0.0002745, whisper_loss=0.09753, over 3589805.67 frames. ], batch size: 65, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:33:13,424 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 07:33:20,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440270.0, ans=0.1 2024-08-10 07:33:23,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=440370.0, ans=0.2 2024-08-10 07:33:24,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440370.0, ans=0.1 2024-08-10 07:33:56,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=15.0 2024-08-10 07:34:14,930 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 600, loss[loss=0.1169, beats_loss=0.01051, ecapa_loss=0.0002293, whisper_loss=0.1041, over 18454.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01209, ecapa_loss=0.0002699, whisper_loss=0.0971, over 3639924.19 frames. ], batch size: 68, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:34:20,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=440770.0, ans=0.125 2024-08-10 07:34:31,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=440870.0, ans=0.125 2024-08-10 07:34:41,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440970.0, ans=0.1 2024-08-10 07:34:45,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.403e+01 3.004e+01 3.329e+01 3.797e+01 6.092e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 07:34:48,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=440970.0, ans=0.0 2024-08-10 07:34:50,998 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.076e+01 2024-08-10 07:35:06,005 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 07:35:07,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=441170.0, ans=0.0 2024-08-10 07:35:08,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-10 07:35:10,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=441170.0, ans=0.125 2024-08-10 07:35:15,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=441170.0, ans=0.2 2024-08-10 07:35:20,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 650, loss[loss=0.1076, beats_loss=0.01155, ecapa_loss=0.000255, whisper_loss=0.0935, over 14778.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01205, ecapa_loss=0.0002719, whisper_loss=0.09647, over 3669011.18 frames. ], batch size: 56, lr: 1.61e-02, grad_scale: 134217728.0 2024-08-10 07:35:58,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=441570.0, ans=0.0 2024-08-10 07:36:10,616 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:36:10,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2024-08-10 07:36:20,068 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 07:36:20,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-10 07:36:21,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=441670.0, ans=0.2 2024-08-10 07:36:26,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 700, loss[loss=0.1044, beats_loss=0.00978, ecapa_loss=0.0002803, whisper_loss=0.0918, over 17005.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01205, ecapa_loss=0.0002722, whisper_loss=0.09648, over 3690343.14 frames. ], batch size: 64, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:36:34,433 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 07:36:48,921 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 07:36:49,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=441870.0, ans=0.125 2024-08-10 07:36:54,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.64 vs. limit=10.0 2024-08-10 07:36:56,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+01 3.137e+01 3.551e+01 4.143e+01 1.211e+02, threshold=7.103e+01, percent-clipped=4.0 2024-08-10 07:36:58,177 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 07:37:00,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.10 vs. limit=10.0 2024-08-10 07:37:08,800 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-10 07:37:14,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=442070.0, ans=0.125 2024-08-10 07:37:16,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=442070.0, ans=0.5 2024-08-10 07:37:19,471 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 07:37:30,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=442170.0, ans=0.125 2024-08-10 07:37:32,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 750, loss[loss=0.09217, beats_loss=0.01209, ecapa_loss=0.0002325, whisper_loss=0.07776, over 19514.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01214, ecapa_loss=0.0002714, whisper_loss=0.0958, over 3730294.53 frames. ], batch size: 77, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:37:38,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442270.0, ans=0.1 2024-08-10 07:37:48,036 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 07:37:49,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=442370.0, ans=0.0 2024-08-10 07:37:49,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=442370.0, ans=0.125 2024-08-10 07:38:10,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=442570.0, ans=0.125 2024-08-10 07:38:26,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=442670.0, ans=0.5 2024-08-10 07:38:32,596 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-10 07:38:37,365 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 800, loss[loss=0.1213, beats_loss=0.01398, ecapa_loss=0.0002281, whisper_loss=0.105, over 23460.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01221, ecapa_loss=0.0002704, whisper_loss=0.09559, over 3778252.64 frames. ], batch size: 91, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:38:43,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442770.0, ans=0.1 2024-08-10 07:38:53,367 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-10 07:38:54,586 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 07:38:58,291 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 07:39:07,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 2.938e+01 3.331e+01 3.852e+01 7.963e+01, threshold=6.661e+01, percent-clipped=1.0 2024-08-10 07:39:14,589 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:39:18,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=443070.0, ans=0.2 2024-08-10 07:39:28,685 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 07:39:42,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 850, loss[loss=0.1217, beats_loss=0.01298, ecapa_loss=0.0002547, whisper_loss=0.1062, over 22724.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01208, ecapa_loss=0.0002707, whisper_loss=0.09653, over 3792595.72 frames. ], batch size: 86, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:39:54,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=443270.0, ans=0.1 2024-08-10 07:40:04,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=443370.0, ans=0.0 2024-08-10 07:40:20,151 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 07:40:25,616 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 9 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-10 07:40:38,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=443670.0, ans=0.0 2024-08-10 07:40:48,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 900, loss[loss=0.1319, beats_loss=0.01296, ecapa_loss=0.0002532, whisper_loss=0.1164, over 23427.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01204, ecapa_loss=0.0002693, whisper_loss=0.09718, over 3799936.78 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:41:03,170 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 07:41:11,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=443870.0, ans=0.125 2024-08-10 07:41:11,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-08-10 07:41:18,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 3.112e+01 3.456e+01 3.897e+01 5.995e+01, threshold=6.912e+01, percent-clipped=0.0 2024-08-10 07:41:22,798 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 34 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 07:41:24,360 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.315e+00 2024-08-10 07:41:24,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=443970.0, ans=0.0 2024-08-10 07:41:25,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443970.0, ans=0.1 2024-08-10 07:41:25,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=443970.0, ans=0.0 2024-08-10 07:41:44,654 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 07:41:45,847 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 31 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 07:41:48,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2024-08-10 07:41:53,716 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 950, loss[loss=0.1115, beats_loss=0.0114, ecapa_loss=0.000269, whisper_loss=0.09744, over 22530.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01207, ecapa_loss=0.0002691, whisper_loss=0.09629, over 3810426.08 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:41:57,916 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 07:42:04,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=444270.0, ans=0.1 2024-08-10 07:42:08,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=444370.0, ans=0.07 2024-08-10 07:42:19,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=444470.0, ans=0.125 2024-08-10 07:42:20,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=444470.0, ans=0.125 2024-08-10 07:42:25,138 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 07:42:27,530 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 07:42:39,380 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 17 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-10 07:42:53,946 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 07:42:54,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=444670.0, ans=0.125 2024-08-10 07:42:59,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1000, loss[loss=0.07413, beats_loss=0.01331, ecapa_loss=0.0001748, whisper_loss=0.05907, over 16211.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01213, ecapa_loss=0.000267, whisper_loss=0.09562, over 3826962.87 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:43:01,892 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 07:43:15,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-08-10 07:43:19,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=444870.0, ans=0.125 2024-08-10 07:43:25,657 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 07:43:29,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.226e+01 3.648e+01 4.312e+01 7.271e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 07:43:51,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-08-10 07:43:55,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2024-08-10 07:43:57,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=445170.0, ans=0.07 2024-08-10 07:44:04,824 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1050, loss[loss=0.109, beats_loss=0.01017, ecapa_loss=0.0002345, whisper_loss=0.09645, over 16682.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01212, ecapa_loss=0.0002661, whisper_loss=0.09538, over 3834680.85 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:44:10,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=445270.0, ans=0.125 2024-08-10 07:44:23,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=445370.0, ans=0.125 2024-08-10 07:44:32,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2024-08-10 07:44:38,781 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 07:45:02,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445670.0, ans=0.1 2024-08-10 07:45:09,875 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1100, loss[loss=0.1073, beats_loss=0.01011, ecapa_loss=0.0003147, whisper_loss=0.0941, over 14752.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01204, ecapa_loss=0.0002672, whisper_loss=0.09585, over 3819975.43 frames. ], batch size: 59, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:45:10,090 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 07:45:12,577 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 07:45:22,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.66 vs. limit=5.0 2024-08-10 07:45:26,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=445870.0, ans=0.125 2024-08-10 07:45:30,710 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:45:38,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=445970.0, ans=0.125 2024-08-10 07:45:38,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=445970.0, ans=10.0 2024-08-10 07:45:39,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 3.161e+01 3.477e+01 3.934e+01 8.780e+01, threshold=6.953e+01, percent-clipped=2.0 2024-08-10 07:45:46,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-10 07:45:56,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=446070.0, ans=0.0 2024-08-10 07:46:04,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446170.0, ans=0.1 2024-08-10 07:46:12,105 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 07:46:14,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1150, loss[loss=0.09078, beats_loss=0.01315, ecapa_loss=0.000207, whisper_loss=0.07556, over 15135.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01211, ecapa_loss=0.0002675, whisper_loss=0.09637, over 3834028.52 frames. ], batch size: 55, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:46:21,416 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 07:46:33,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=446370.0, ans=0.125 2024-08-10 07:46:41,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2024-08-10 07:46:53,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=446570.0, ans=0.0 2024-08-10 07:47:11,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=446670.0, ans=0.125 2024-08-10 07:47:12,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=446670.0, ans=0.0 2024-08-10 07:47:20,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1200, loss[loss=0.1138, beats_loss=0.009573, ecapa_loss=0.0003138, whisper_loss=0.1011, over 19354.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01215, ecapa_loss=0.0002674, whisper_loss=0.09621, over 3823669.97 frames. ], batch size: 77, lr: 1.60e-02, grad_scale: 134217728.0 2024-08-10 07:47:45,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=12.0 2024-08-10 07:47:50,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.044e+01 3.412e+01 3.944e+01 6.015e+01, threshold=6.823e+01, percent-clipped=0.0 2024-08-10 07:48:11,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=447070.0, ans=0.0 2024-08-10 07:48:11,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=447070.0, ans=0.0 2024-08-10 07:48:23,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=447170.0, ans=0.125 2024-08-10 07:48:26,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=447170.0, ans=0.0 2024-08-10 07:48:28,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1250, loss[loss=0.08958, beats_loss=0.01335, ecapa_loss=0.0002316, whisper_loss=0.07391, over 17432.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01214, ecapa_loss=0.0002668, whisper_loss=0.09559, over 3833859.14 frames. ], batch size: 72, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:49:02,724 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 07:49:15,444 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 07:49:33,277 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 07:49:39,970 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1300, loss[loss=0.1271, beats_loss=0.0102, ecapa_loss=0.0003097, whisper_loss=0.1138, over 21724.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01207, ecapa_loss=0.0002668, whisper_loss=0.09651, over 3840353.40 frames. ], batch size: 89, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:49:50,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447770.0, ans=0.1 2024-08-10 07:49:53,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.50 vs. limit=22.5 2024-08-10 07:50:12,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 3.001e+01 3.337e+01 3.796e+01 6.277e+01, threshold=6.674e+01, percent-clipped=0.0 2024-08-10 07:50:14,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=447970.0, ans=0.125 2024-08-10 07:50:19,279 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 07:50:19,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=447970.0, ans=0.125 2024-08-10 07:50:19,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=12.0 2024-08-10 07:50:20,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=448070.0, ans=0.0 2024-08-10 07:50:35,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=15.0 2024-08-10 07:50:51,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1350, loss[loss=0.1044, beats_loss=0.01228, ecapa_loss=0.000234, whisper_loss=0.08976, over 17660.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0121, ecapa_loss=0.0002636, whisper_loss=0.09692, over 3848156.96 frames. ], batch size: 70, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:51:04,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-08-10 07:51:04,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=448370.0, ans=0.125 2024-08-10 07:51:11,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-08-10 07:51:31,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=448470.0, ans=0.125 2024-08-10 07:51:32,918 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 07:51:59,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=448670.0, ans=0.025 2024-08-10 07:51:59,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-08-10 07:52:03,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1400, loss[loss=0.1243, beats_loss=0.01062, ecapa_loss=0.0002431, whisper_loss=0.1113, over 19546.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01202, ecapa_loss=0.0002631, whisper_loss=0.09701, over 3854429.31 frames. ], batch size: 73, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:52:05,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=448770.0, ans=0.125 2024-08-10 07:52:07,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=448770.0, ans=0.0 2024-08-10 07:52:08,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=448770.0, ans=0.0 2024-08-10 07:52:20,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2024-08-10 07:52:37,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 2.977e+01 3.358e+01 3.939e+01 6.744e+01, threshold=6.717e+01, percent-clipped=2.0 2024-08-10 07:52:49,442 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-10 07:52:53,800 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 07:52:59,169 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 07:53:00,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=449170.0, ans=0.0 2024-08-10 07:53:15,951 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 39 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 07:53:17,574 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1450, loss[loss=0.1337, beats_loss=0.01063, ecapa_loss=0.0002455, whisper_loss=0.1206, over 23671.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01205, ecapa_loss=0.0002617, whisper_loss=0.09608, over 3799674.03 frames. ], batch size: 92, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:53:52,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=449270.0, ans=0.125 2024-08-10 07:54:18,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=449470.0, ans=0.0 2024-08-10 07:54:22,536 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 07:54:26,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=449470.0, ans=0.0 2024-08-10 07:54:31,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=449570.0, ans=0.125 2024-08-10 07:54:36,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-10 07:54:42,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449570.0, ans=0.1 2024-08-10 07:54:48,262 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-10 07:54:51,684 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 07:55:00,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2024-08-10 07:55:00,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1500, loss[loss=0.08987, beats_loss=0.01414, ecapa_loss=0.000273, whisper_loss=0.073, over 17973.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01213, ecapa_loss=0.0002596, whisper_loss=0.09581, over 3800232.48 frames. ], batch size: 74, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:55:09,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=449770.0, ans=0.0 2024-08-10 07:55:27,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449870.0, ans=0.125 2024-08-10 07:55:34,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=449970.0, ans=0.125 2024-08-10 07:55:35,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.938e+01 3.327e+01 3.975e+01 6.102e+01, threshold=6.654e+01, percent-clipped=0.0 2024-08-10 07:55:43,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=449970.0, ans=0.125 2024-08-10 07:55:51,152 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 07:56:07,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=450170.0, ans=0.125 2024-08-10 07:56:16,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1550, loss[loss=0.1049, beats_loss=0.01196, ecapa_loss=0.0002707, whisper_loss=0.0902, over 22261.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01211, ecapa_loss=0.0002607, whisper_loss=0.09531, over 3794675.09 frames. ], batch size: 88, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:56:27,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450270.0, ans=0.1 2024-08-10 07:56:29,241 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 07:56:37,167 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 07:56:54,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-10 07:57:14,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450570.0, ans=0.125 2024-08-10 07:57:20,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-10 07:57:23,136 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 07:57:32,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1600, loss[loss=0.1095, beats_loss=0.0135, ecapa_loss=0.0001975, whisper_loss=0.09405, over 21359.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01217, ecapa_loss=0.0002601, whisper_loss=0.09566, over 3828860.64 frames. ], batch size: 81, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:57:38,335 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 07:57:48,729 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 07:57:48,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=450870.0, ans=0.125 2024-08-10 07:57:52,760 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 07:58:07,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 3.094e+01 3.435e+01 3.999e+01 7.884e+01, threshold=6.871e+01, percent-clipped=1.0 2024-08-10 07:58:24,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2024-08-10 07:58:27,974 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 07:58:41,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=451170.0, ans=0.0 2024-08-10 07:58:46,854 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1650, loss[loss=0.1204, beats_loss=0.01026, ecapa_loss=0.0002416, whisper_loss=0.1077, over 17801.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.0122, ecapa_loss=0.0002591, whisper_loss=0.09587, over 3834646.00 frames. ], batch size: 67, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 07:58:47,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451270.0, ans=0.1 2024-08-10 07:58:55,459 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 07:59:07,789 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-10 07:59:17,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451470.0, ans=0.1 2024-08-10 07:59:18,950 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 07:59:26,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=451470.0, ans=0.04949747468305833 2024-08-10 07:59:58,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1700, loss[loss=0.1121, beats_loss=0.009528, ecapa_loss=0.0003146, whisper_loss=0.09942, over 18179.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01216, ecapa_loss=0.0002584, whisper_loss=0.09619, over 3859081.91 frames. ], batch size: 75, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:00:01,913 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-10 08:00:22,127 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 33 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 08:00:22,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=451870.0, ans=0.0 2024-08-10 08:00:23,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2024-08-10 08:00:31,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+01 3.130e+01 3.389e+01 3.948e+01 7.641e+01, threshold=6.778e+01, percent-clipped=2.0 2024-08-10 08:00:44,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=452070.0, ans=0.125 2024-08-10 08:00:57,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.42 vs. limit=10.0 2024-08-10 08:01:01,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-10 08:01:01,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-08-10 08:01:08,922 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1750, loss[loss=0.1294, beats_loss=0.012, ecapa_loss=0.0002811, whisper_loss=0.1146, over 23396.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01209, ecapa_loss=0.0002609, whisper_loss=0.09634, over 3829797.34 frames. ], batch size: 93, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:01:25,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-10 08:01:50,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452570.0, ans=0.1 2024-08-10 08:02:18,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1800, loss[loss=0.08674, beats_loss=0.01575, ecapa_loss=0.0002279, whisper_loss=0.06871, over 21855.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01202, ecapa_loss=0.0002618, whisper_loss=0.09584, over 3808095.13 frames. ], batch size: 90, lr: 1.59e-02, grad_scale: 134217728.0 2024-08-10 08:02:24,081 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 08:02:24,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=452770.0, ans=0.125 2024-08-10 08:02:30,374 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 08:02:32,718 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-10 08:02:49,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+01 3.196e+01 3.582e+01 4.110e+01 5.783e+01, threshold=7.164e+01, percent-clipped=0.0 2024-08-10 08:03:26,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1850, loss[loss=0.1156, beats_loss=0.01131, ecapa_loss=0.0002687, whisper_loss=0.1016, over 20561.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01201, ecapa_loss=0.0002622, whisper_loss=0.09637, over 3807120.61 frames. ], batch size: 80, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:03:27,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=453270.0, ans=22.5 2024-08-10 08:03:29,882 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 08:03:42,343 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 08:03:50,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-10 08:04:07,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=19.44 vs. limit=15.0 2024-08-10 08:04:15,691 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 08:04:27,977 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 08:04:39,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1900, loss[loss=0.1247, beats_loss=0.01169, ecapa_loss=0.0002944, whisper_loss=0.1101, over 22652.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01204, ecapa_loss=0.0002655, whisper_loss=0.09632, over 3832748.34 frames. ], batch size: 90, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:04:44,620 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 08:05:07,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-10 08:05:10,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 3.027e+01 3.393e+01 3.845e+01 7.336e+01, threshold=6.786e+01, percent-clipped=1.0 2024-08-10 08:05:13,857 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 08:05:16,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=453970.0, ans=0.125 2024-08-10 08:05:18,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=453970.0, ans=6.0 2024-08-10 08:05:21,090 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-10 08:05:25,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=454070.0, ans=0.07 2024-08-10 08:05:27,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2024-08-10 08:05:35,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=454170.0, ans=0.125 2024-08-10 08:05:42,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=454170.0, ans=0.125 2024-08-10 08:05:49,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 1950, loss[loss=0.1171, beats_loss=0.01034, ecapa_loss=0.0002638, whisper_loss=0.1041, over 16911.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01208, ecapa_loss=0.0002701, whisper_loss=0.09621, over 3823521.89 frames. ], batch size: 62, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:05:58,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.01 vs. limit=15.0 2024-08-10 08:06:05,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=454370.0, ans=0.125 2024-08-10 08:06:19,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-08-10 08:06:22,175 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 08:06:37,468 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 08:06:45,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-08-10 08:07:00,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2000, loss[loss=0.1254, beats_loss=0.01004, ecapa_loss=0.0002355, whisper_loss=0.113, over 18720.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01205, ecapa_loss=0.0002752, whisper_loss=0.09575, over 3805143.99 frames. ], batch size: 68, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:07:15,789 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 08:07:16,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2024-08-10 08:07:22,562 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 08:07:23,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-10 08:07:24,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=454870.0, ans=0.0 2024-08-10 08:07:31,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=454970.0, ans=0.07 2024-08-10 08:07:34,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.624e+01 3.304e+01 3.702e+01 4.234e+01 5.771e+01, threshold=7.405e+01, percent-clipped=0.0 2024-08-10 08:07:39,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=454970.0, ans=0.2 2024-08-10 08:08:01,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=455170.0, ans=0.125 2024-08-10 08:08:13,074 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2050, loss[loss=0.1043, beats_loss=0.01039, ecapa_loss=0.0003011, whisper_loss=0.0909, over 21277.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01209, ecapa_loss=0.0002765, whisper_loss=0.09545, over 3817094.46 frames. ], batch size: 84, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:08:19,834 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 08:08:30,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=455370.0, ans=0.035 2024-08-10 08:08:31,267 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 15 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-10 08:08:35,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455370.0, ans=0.1 2024-08-10 08:08:53,833 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 08:09:24,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2100, loss[loss=0.1108, beats_loss=0.01078, ecapa_loss=0.0002615, whisper_loss=0.09738, over 14943.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01224, ecapa_loss=0.0002739, whisper_loss=0.09454, over 3780267.04 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:09:27,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=455770.0, ans=0.125 2024-08-10 08:09:29,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=15.0 2024-08-10 08:09:40,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=12.0 2024-08-10 08:09:44,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2024-08-10 08:09:45,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2024-08-10 08:09:49,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=455870.0, ans=0.1 2024-08-10 08:09:56,749 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 08:09:57,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.943e+01 3.340e+01 3.951e+01 7.714e+01, threshold=6.679e+01, percent-clipped=1.0 2024-08-10 08:10:07,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.61 vs. limit=15.0 2024-08-10 08:10:19,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=456070.0, ans=0.125 2024-08-10 08:10:19,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=456070.0, ans=0.0 2024-08-10 08:10:27,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=456170.0, ans=0.0 2024-08-10 08:10:31,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=456170.0, ans=0.07 2024-08-10 08:10:36,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2150, loss[loss=0.1183, beats_loss=0.01203, ecapa_loss=0.0002801, whisper_loss=0.1035, over 19262.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01221, ecapa_loss=0.0002753, whisper_loss=0.09545, over 3806241.06 frames. ], batch size: 78, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:10:45,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=456270.0, ans=0.125 2024-08-10 08:10:45,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=456270.0, ans=0.125 2024-08-10 08:10:54,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=456370.0, ans=0.125 2024-08-10 08:10:56,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=456370.0, ans=0.125 2024-08-10 08:10:57,650 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 08:11:02,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=456370.0, ans=0.0 2024-08-10 08:11:16,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=456470.0, ans=0.07 2024-08-10 08:11:18,268 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-10 08:11:29,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=456570.0, ans=0.07 2024-08-10 08:11:51,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2200, loss[loss=0.1222, beats_loss=0.009952, ecapa_loss=0.0002969, whisper_loss=0.1092, over 21557.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01218, ecapa_loss=0.0002751, whisper_loss=0.09605, over 3802985.42 frames. ], batch size: 82, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:12:05,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=12.0 2024-08-10 08:12:26,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 3.107e+01 3.618e+01 4.202e+01 6.900e+01, threshold=7.235e+01, percent-clipped=1.0 2024-08-10 08:12:58,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=457170.0, ans=0.0 2024-08-10 08:13:05,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2250, loss[loss=0.06888, beats_loss=0.01493, ecapa_loss=0.0003635, whisper_loss=0.05031, over 14066.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.0121, ecapa_loss=0.0002761, whisper_loss=0.0969, over 3808091.14 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:13:32,808 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 08:13:36,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.63 vs. limit=15.0 2024-08-10 08:13:37,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=457470.0, ans=0.125 2024-08-10 08:13:45,708 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 08:13:54,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=457570.0, ans=0.125 2024-08-10 08:14:02,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=457570.0, ans=10.0 2024-08-10 08:14:05,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-08-10 08:14:21,625 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2300, loss[loss=0.1052, beats_loss=0.01222, ecapa_loss=0.0002731, whisper_loss=0.0903, over 16185.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01212, ecapa_loss=0.0002765, whisper_loss=0.09709, over 3844997.92 frames. ], batch size: 63, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:14:22,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-08-10 08:14:52,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=457970.0, ans=0.95 2024-08-10 08:14:56,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 3.052e+01 3.526e+01 3.987e+01 6.394e+01, threshold=7.053e+01, percent-clipped=0.0 2024-08-10 08:14:57,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=457970.0, ans=0.0 2024-08-10 08:15:01,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=457970.0, ans=0.125 2024-08-10 08:15:37,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2350, loss[loss=0.1299, beats_loss=0.01254, ecapa_loss=0.0002804, whisper_loss=0.1145, over 20339.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01216, ecapa_loss=0.0002757, whisper_loss=0.09762, over 3875103.35 frames. ], batch size: 81, lr: 1.58e-02, grad_scale: 134217728.0 2024-08-10 08:15:44,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=458270.0, ans=0.07 2024-08-10 08:15:48,109 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 08:15:48,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=458270.0, ans=0.2 2024-08-10 08:15:53,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=458370.0, ans=0.125 2024-08-10 08:16:04,860 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 08:16:13,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=458470.0, ans=0.0 2024-08-10 08:16:32,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458570.0, ans=0.1 2024-08-10 08:16:56,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2400, loss[loss=0.1007, beats_loss=0.0134, ecapa_loss=0.0002638, whisper_loss=0.08467, over 16248.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01205, ecapa_loss=0.0002779, whisper_loss=0.09746, over 3868422.70 frames. ], batch size: 65, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:16:59,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=458770.0, ans=0.0 2024-08-10 08:17:03,419 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.640e+05 2024-08-10 08:17:12,465 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 08:17:16,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2024-08-10 08:17:29,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.868e+01 3.229e+01 3.686e+01 5.514e+01, threshold=6.458e+01, percent-clipped=0.0 2024-08-10 08:17:35,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=458970.0, ans=0.125 2024-08-10 08:17:39,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=459070.0, ans=0.125 2024-08-10 08:17:56,184 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 08:18:01,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=459170.0, ans=0.0 2024-08-10 08:18:06,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=459170.0, ans=0.125 2024-08-10 08:18:18,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2450, loss[loss=0.09692, beats_loss=0.01398, ecapa_loss=0.0002541, whisper_loss=0.0804, over 19834.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01214, ecapa_loss=0.0002751, whisper_loss=0.09669, over 3867832.45 frames. ], batch size: 79, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:18:24,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=459270.0, ans=0.2 2024-08-10 08:18:27,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=459270.0, ans=0.2 2024-08-10 08:18:29,796 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 16 from LS+wenet, 30 from Vox, 47 fro AS 2024-08-10 08:18:41,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=459370.0, ans=0.125 2024-08-10 08:18:59,039 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 08:19:06,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2024-08-10 08:19:19,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=459570.0, ans=0.125 2024-08-10 08:19:41,788 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2500, loss[loss=0.1177, beats_loss=0.01123, ecapa_loss=0.0002336, whisper_loss=0.1042, over 21297.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0121, ecapa_loss=0.0002761, whisper_loss=0.09689, over 3866963.44 frames. ], batch size: 84, lr: 1.57e-02, grad_scale: 134217728.0 2024-08-10 08:19:41,994 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 08:19:54,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=12.0 2024-08-10 08:20:10,702 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 08:20:13,526 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 38 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 08:20:29,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.01 vs. limit=6.0 2024-08-10 08:20:31,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+01 2.999e+01 3.542e+01 3.925e+01 6.520e+01, threshold=7.085e+01, percent-clipped=1.0 2024-08-10 08:20:35,589 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.461e-02 2024-08-10 08:20:50,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=460070.0, ans=0.125 2024-08-10 08:21:08,080 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-10 08:21:25,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2550, loss[loss=0.108, beats_loss=0.01084, ecapa_loss=0.0002552, whisper_loss=0.09462, over 16500.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01214, ecapa_loss=0.0002746, whisper_loss=0.09711, over 3875367.01 frames. ], batch size: 62, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:21:34,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=460270.0, ans=0.0 2024-08-10 08:21:51,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-08-10 08:21:53,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2024-08-10 08:22:01,146 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 08:22:17,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=460470.0, ans=0.125 2024-08-10 08:22:20,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=460470.0, ans=0.125 2024-08-10 08:22:28,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=460570.0, ans=0.0 2024-08-10 08:22:36,165 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 08:22:41,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=460570.0, ans=0.125 2024-08-10 08:22:44,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-08-10 08:22:47,047 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 08:22:59,862 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 08:23:04,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=460670.0, ans=0.09899494936611666 2024-08-10 08:23:08,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2600, loss[loss=0.09542, beats_loss=0.0145, ecapa_loss=0.0002631, whisper_loss=0.07829, over 19091.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01214, ecapa_loss=0.0002754, whisper_loss=0.09674, over 3842635.32 frames. ], batch size: 80, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:23:13,352 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 08:23:27,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=460770.0, ans=0.125 2024-08-10 08:24:01,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 3.079e+01 3.425e+01 3.855e+01 5.495e+01, threshold=6.850e+01, percent-clipped=0.0 2024-08-10 08:24:10,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-10 08:24:22,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=461070.0, ans=0.1 2024-08-10 08:24:37,695 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 08:24:50,173 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 08:24:50,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=461170.0, ans=0.125 2024-08-10 08:25:03,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2650, loss[loss=0.1156, beats_loss=0.01019, ecapa_loss=0.0003472, whisper_loss=0.1019, over 15508.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01211, ecapa_loss=0.0002761, whisper_loss=0.0968, over 3843057.77 frames. ], batch size: 64, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:25:04,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=461270.0, ans=0.125 2024-08-10 08:25:21,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=461270.0, ans=0.0 2024-08-10 08:26:02,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-08-10 08:26:03,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=461470.0, ans=0.1 2024-08-10 08:26:11,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461570.0, ans=0.1 2024-08-10 08:26:28,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=461570.0, ans=0.125 2024-08-10 08:26:39,963 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 08:26:57,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2700, loss[loss=0.08669, beats_loss=0.01407, ecapa_loss=0.0002042, whisper_loss=0.07058, over 14363.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01209, ecapa_loss=0.0002807, whisper_loss=0.09649, over 3844886.35 frames. ], batch size: 58, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:27:05,481 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 08:27:06,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-08-10 08:27:20,725 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 08:27:20,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=461870.0, ans=0.125 2024-08-10 08:27:23,267 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 26 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-10 08:27:41,070 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 08:27:48,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+01 3.222e+01 3.601e+01 4.234e+01 3.838e+02, threshold=7.201e+01, percent-clipped=7.0 2024-08-10 08:27:56,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=461970.0, ans=0.125 2024-08-10 08:28:03,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=462070.0, ans=0.1 2024-08-10 08:28:29,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=462270.0, ans=0.125 2024-08-10 08:28:30,387 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2750, loss[loss=0.1192, beats_loss=0.009505, ecapa_loss=0.0003321, whisper_loss=0.1063, over 23033.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01207, ecapa_loss=0.0002805, whisper_loss=0.09706, over 3843145.05 frames. ], batch size: 94, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:28:32,404 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 08:28:48,572 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 08:28:57,359 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 08:28:58,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=462470.0, ans=0.07 2024-08-10 08:29:04,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=462470.0, ans=0.0 2024-08-10 08:29:15,977 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 08:29:22,532 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.096e-03 2024-08-10 08:29:25,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=462570.0, ans=0.0 2024-08-10 08:29:26,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=462570.0, ans=0.0 2024-08-10 08:29:34,045 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 08:29:45,947 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2800, loss[loss=0.09599, beats_loss=0.01565, ecapa_loss=0.0001912, whisper_loss=0.07843, over 15906.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01205, ecapa_loss=0.0002782, whisper_loss=0.09694, over 3830846.85 frames. ], batch size: 62, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:29:49,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=462770.0, ans=6.0 2024-08-10 08:30:02,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462870.0, ans=0.1 2024-08-10 08:30:04,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2024-08-10 08:30:05,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=462870.0, ans=0.015 2024-08-10 08:30:19,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=462970.0, ans=0.0 2024-08-10 08:30:19,949 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+01 3.197e+01 3.685e+01 4.218e+01 5.823e+01, threshold=7.371e+01, percent-clipped=0.0 2024-08-10 08:31:01,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2850, loss[loss=0.125, beats_loss=0.01037, ecapa_loss=0.0002931, whisper_loss=0.1117, over 18225.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01201, ecapa_loss=0.0002776, whisper_loss=0.09688, over 3818295.47 frames. ], batch size: 71, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:31:04,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-10 08:31:10,077 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 08:31:18,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2024-08-10 08:31:35,407 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 08:31:35,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-08-10 08:31:58,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-10 08:32:08,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.21 vs. limit=12.0 2024-08-10 08:32:24,074 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2900, loss[loss=0.1083, beats_loss=0.01258, ecapa_loss=0.0003206, whisper_loss=0.09248, over 21871.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01203, ecapa_loss=0.0002797, whisper_loss=0.09716, over 3829588.00 frames. ], batch size: 93, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:32:41,006 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 08:32:51,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=463870.0, ans=0.125 2024-08-10 08:33:00,026 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 08:33:05,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.474e+01 3.004e+01 3.404e+01 3.788e+01 1.422e+02, threshold=6.807e+01, percent-clipped=1.0 2024-08-10 08:33:08,013 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 08:33:14,372 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 08:33:18,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=464070.0, ans=0.2 2024-08-10 08:33:19,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=464070.0, ans=0.125 2024-08-10 08:33:33,696 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 08:33:55,621 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 2950, loss[loss=0.09829, beats_loss=0.01586, ecapa_loss=0.0002212, whisper_loss=0.08021, over 22965.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.0122, ecapa_loss=0.0002793, whisper_loss=0.09627, over 3849059.79 frames. ], batch size: 90, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:33:57,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=464270.0, ans=0.0 2024-08-10 08:33:57,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=464270.0, ans=10.0 2024-08-10 08:34:16,756 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 08:34:41,504 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 08:34:45,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=464470.0, ans=0.125 2024-08-10 08:35:14,655 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-10 08:35:27,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3000, loss[loss=0.1153, beats_loss=0.01106, ecapa_loss=0.0002754, whisper_loss=0.1014, over 23138.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01221, ecapa_loss=0.0002773, whisper_loss=0.09667, over 3902997.52 frames. ], batch size: 89, lr: 1.57e-02, grad_scale: 268435456.0 2024-08-10 08:35:27,863 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 08:36:05,675 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on ASR_libri: loss=0.2648, beats_loss=0, ecapa_loss=0.0008316, whisper_loss=0.2565, over 922467.00 frames. 2024-08-10 08:36:23,281 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on SV_voxceleb1: loss=0.007277, beats_loss=0, ecapa_loss=0.0007277, whisper_loss=0, over 939242.00 frames. 2024-08-10 08:38:19,691 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on AT_audioset: loss=0.0279, beats_loss=0.0279, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 08:38:19,696 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 08:38:53,014 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 08:38:57,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 3.167e+01 3.615e+01 4.298e+01 8.066e+01, threshold=7.230e+01, percent-clipped=1.0 2024-08-10 08:39:00,522 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 08:39:27,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=465170.0, ans=0.125 2024-08-10 08:39:31,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465170.0, ans=0.1 2024-08-10 08:39:40,930 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3050, loss[loss=0.08881, beats_loss=0.01469, ecapa_loss=0.0002384, whisper_loss=0.07173, over 23188.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01224, ecapa_loss=0.0002779, whisper_loss=0.09669, over 3889433.56 frames. ], batch size: 93, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:39:48,852 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 08:40:05,054 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 08:40:07,942 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 08:40:17,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-10 08:40:57,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=465670.0, ans=0.0 2024-08-10 08:41:02,305 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 08:41:02,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-08-10 08:41:03,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3100, loss[loss=0.1054, beats_loss=0.01336, ecapa_loss=0.0002104, whisper_loss=0.08998, over 18687.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01225, ecapa_loss=0.0002792, whisper_loss=0.09729, over 3896409.64 frames. ], batch size: 72, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:41:16,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465770.0, ans=0.1 2024-08-10 08:41:20,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=465870.0, ans=0.2 2024-08-10 08:41:43,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 3.398e+01 3.878e+01 4.582e+01 1.719e+02, threshold=7.756e+01, percent-clipped=2.0 2024-08-10 08:41:57,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=466070.0, ans=0.04949747468305833 2024-08-10 08:42:18,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=466170.0, ans=0.0 2024-08-10 08:42:33,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3150, loss[loss=0.1038, beats_loss=0.01104, ecapa_loss=0.0004185, whisper_loss=0.0886, over 15711.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01225, ecapa_loss=0.0002794, whisper_loss=0.09727, over 3876330.57 frames. ], batch size: 68, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:42:36,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=466270.0, ans=0.0 2024-08-10 08:42:46,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=466270.0, ans=0.0 2024-08-10 08:42:57,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.44 vs. limit=22.5 2024-08-10 08:43:01,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466370.0, ans=0.1 2024-08-10 08:43:04,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=466370.0, ans=0.0 2024-08-10 08:43:06,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=466470.0, ans=0.0 2024-08-10 08:43:08,349 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.812e-02 2024-08-10 08:43:27,523 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 08:43:46,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=466670.0, ans=0.125 2024-08-10 08:43:46,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=466670.0, ans=0.125 2024-08-10 08:43:52,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=466670.0, ans=0.125 2024-08-10 08:43:57,159 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 08:43:58,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3200, loss[loss=0.1283, beats_loss=0.009729, ecapa_loss=0.0003345, whisper_loss=0.1152, over 21521.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01217, ecapa_loss=0.0002783, whisper_loss=0.0978, over 3878383.71 frames. ], batch size: 84, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:44:10,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=466770.0, ans=0.125 2024-08-10 08:44:24,415 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 08:44:35,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=466970.0, ans=0.1 2024-08-10 08:44:40,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 3.101e+01 3.705e+01 4.309e+01 1.166e+02, threshold=7.411e+01, percent-clipped=1.0 2024-08-10 08:44:45,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=466970.0, ans=0.04949747468305833 2024-08-10 08:44:55,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=467070.0, ans=0.1 2024-08-10 08:45:03,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=467070.0, ans=0.1 2024-08-10 08:45:27,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=467170.0, ans=15.0 2024-08-10 08:45:32,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3250, loss[loss=0.1002, beats_loss=0.01106, ecapa_loss=0.0003663, whisper_loss=0.08549, over 19439.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01208, ecapa_loss=0.0002793, whisper_loss=0.09815, over 3873678.78 frames. ], batch size: 81, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:45:42,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-10 08:45:55,431 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 08:45:58,512 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 08:46:06,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=467370.0, ans=0.0 2024-08-10 08:46:09,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.39 vs. limit=15.0 2024-08-10 08:46:20,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=467470.0, ans=0.0 2024-08-10 08:46:56,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-10 08:47:00,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=467670.0, ans=0.2 2024-08-10 08:47:02,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=467770.0, ans=0.0 2024-08-10 08:47:04,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3300, loss[loss=0.1255, beats_loss=0.0131, ecapa_loss=0.0002735, whisper_loss=0.1096, over 23445.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01223, ecapa_loss=0.0002769, whisper_loss=0.0969, over 3844781.55 frames. ], batch size: 94, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:47:04,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=467770.0, ans=0.125 2024-08-10 08:47:16,130 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 08:47:30,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=15.0 2024-08-10 08:47:32,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=467870.0, ans=0.2 2024-08-10 08:47:46,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 3.041e+01 3.344e+01 3.812e+01 6.169e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 08:48:03,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=468070.0, ans=0.0 2024-08-10 08:48:14,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=468170.0, ans=0.125 2024-08-10 08:48:24,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=468170.0, ans=0.1 2024-08-10 08:48:34,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3350, loss[loss=0.1156, beats_loss=0.01245, ecapa_loss=0.0002687, whisper_loss=0.1005, over 15450.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01213, ecapa_loss=0.0002789, whisper_loss=0.09784, over 3872233.09 frames. ], batch size: 61, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:48:34,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=468270.0, ans=0.0 2024-08-10 08:48:48,682 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 42 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 08:49:14,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=468470.0, ans=0.125 2024-08-10 08:49:16,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-10 08:49:31,304 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 08:49:46,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=468670.0, ans=0.0 2024-08-10 08:49:58,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-08-10 08:49:58,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3400, loss[loss=0.1156, beats_loss=0.01203, ecapa_loss=0.0002584, whisper_loss=0.101, over 22434.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01209, ecapa_loss=0.0002776, whisper_loss=0.09851, over 3912439.24 frames. ], batch size: 86, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:50:34,198 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 3.156e+01 3.587e+01 4.181e+01 1.855e+02, threshold=7.174e+01, percent-clipped=2.0 2024-08-10 08:50:34,405 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 08:50:39,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=468970.0, ans=0.05 2024-08-10 08:50:53,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469070.0, ans=0.1 2024-08-10 08:51:15,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3450, loss[loss=0.1145, beats_loss=0.0126, ecapa_loss=0.0002388, whisper_loss=0.09956, over 23300.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01209, ecapa_loss=0.0002819, whisper_loss=0.0978, over 3910697.48 frames. ], batch size: 90, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:51:22,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-08-10 08:51:23,577 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 20 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-10 08:51:39,009 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 08:51:42,219 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 08:51:43,467 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 08:51:51,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=469470.0, ans=0.0 2024-08-10 08:51:56,887 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.078e+00 2024-08-10 08:52:04,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-10 08:52:08,554 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 08:52:14,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=469670.0, ans=0.125 2024-08-10 08:52:17,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-10 08:52:18,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2024-08-10 08:52:21,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=469670.0, ans=0.2 2024-08-10 08:52:28,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=469770.0, ans=0.2 2024-08-10 08:52:29,770 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3500, loss[loss=0.1104, beats_loss=0.01254, ecapa_loss=0.0002363, whisper_loss=0.09554, over 23788.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01206, ecapa_loss=0.0002797, whisper_loss=0.09777, over 3890552.88 frames. ], batch size: 93, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:52:29,914 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 08:52:55,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=469870.0, ans=0.125 2024-08-10 08:53:04,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.316e+01 3.037e+01 3.390e+01 3.981e+01 6.541e+01, threshold=6.780e+01, percent-clipped=0.0 2024-08-10 08:53:16,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=470070.0, ans=0.0 2024-08-10 08:53:21,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2024-08-10 08:53:25,424 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-10 08:53:32,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=470170.0, ans=0.125 2024-08-10 08:53:44,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3550, loss[loss=0.1113, beats_loss=0.01421, ecapa_loss=0.0002047, whisper_loss=0.09499, over 24346.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01212, ecapa_loss=0.0002773, whisper_loss=0.09748, over 3886230.58 frames. ], batch size: 93, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:53:51,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=470270.0, ans=0.125 2024-08-10 08:53:54,101 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 08:54:07,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=470370.0, ans=0.0 2024-08-10 08:54:28,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=470570.0, ans=0.125 2024-08-10 08:54:41,316 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 08:54:44,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470670.0, ans=0.1 2024-08-10 08:54:54,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470670.0, ans=0.1 2024-08-10 08:54:57,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3600, loss[loss=0.1292, beats_loss=0.01119, ecapa_loss=0.0002115, whisper_loss=0.1159, over 15661.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01212, ecapa_loss=0.0002759, whisper_loss=0.09807, over 3892895.74 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 268435456.0 2024-08-10 08:55:00,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=470770.0, ans=0.2 2024-08-10 08:55:24,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=470970.0, ans=0.125 2024-08-10 08:55:29,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.993e+01 3.332e+01 3.946e+01 5.463e+01, threshold=6.665e+01, percent-clipped=0.0 2024-08-10 08:55:36,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-10 08:55:40,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-10 08:55:44,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=471070.0, ans=0.0 2024-08-10 08:55:53,859 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 08:55:57,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-08-10 08:56:09,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-08-10 08:56:10,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3650, loss[loss=0.125, beats_loss=0.01245, ecapa_loss=0.0003179, whisper_loss=0.1093, over 23290.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01209, ecapa_loss=0.0002783, whisper_loss=0.09813, over 3868759.02 frames. ], batch size: 94, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:56:13,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=471270.0, ans=0.0 2024-08-10 08:56:31,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-10 08:56:43,945 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 08:56:45,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=471470.0, ans=0.125 2024-08-10 08:56:48,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=471470.0, ans=0.125 2024-08-10 08:56:54,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=471570.0, ans=0.0 2024-08-10 08:57:01,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=471570.0, ans=0.125 2024-08-10 08:57:05,387 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 08:57:05,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=471670.0, ans=0.0 2024-08-10 08:57:09,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471670.0, ans=0.1 2024-08-10 08:57:10,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471670.0, ans=0.1 2024-08-10 08:57:13,430 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 08:57:20,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3700, loss[loss=0.1027, beats_loss=0.01177, ecapa_loss=0.0003472, whisper_loss=0.08749, over 14453.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01218, ecapa_loss=0.0002786, whisper_loss=0.09747, over 3833046.41 frames. ], batch size: 59, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:57:22,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=471770.0, ans=0.125 2024-08-10 08:57:32,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=471870.0, ans=0.0 2024-08-10 08:57:34,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=471870.0, ans=0.125 2024-08-10 08:57:51,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+01 3.070e+01 3.607e+01 4.290e+01 1.526e+02, threshold=7.214e+01, percent-clipped=4.0 2024-08-10 08:57:58,715 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-10 08:58:17,517 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.887e-01 2024-08-10 08:58:23,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=472170.0, ans=0.0 2024-08-10 08:58:27,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3750, loss[loss=0.1188, beats_loss=0.01152, ecapa_loss=0.0003206, whisper_loss=0.1041, over 21455.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01219, ecapa_loss=0.0002786, whisper_loss=0.09786, over 3834653.39 frames. ], batch size: 88, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:58:34,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=472270.0, ans=0.04949747468305833 2024-08-10 08:58:43,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=472370.0, ans=0.2 2024-08-10 08:58:52,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=472370.0, ans=0.125 2024-08-10 08:59:13,139 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 08:59:13,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=472570.0, ans=0.125 2024-08-10 08:59:15,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-08-10 08:59:24,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=472570.0, ans=0.1 2024-08-10 08:59:31,488 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 08:59:40,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3800, loss[loss=0.09932, beats_loss=0.01506, ecapa_loss=0.0002309, whisper_loss=0.08196, over 21896.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01234, ecapa_loss=0.0002784, whisper_loss=0.09634, over 3830392.49 frames. ], batch size: 90, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 08:59:47,093 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 08:59:55,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2024-08-10 09:00:03,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=472870.0, ans=0.2 2024-08-10 09:00:13,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.482e+01 3.142e+01 3.387e+01 4.333e+01 6.143e+01, threshold=6.774e+01, percent-clipped=0.0 2024-08-10 09:00:22,081 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 09:00:36,152 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 09:00:41,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=473170.0, ans=0.125 2024-08-10 09:00:48,471 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 09:00:51,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2024-08-10 09:00:52,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3850, loss[loss=0.1266, beats_loss=0.01418, ecapa_loss=0.0002462, whisper_loss=0.1099, over 15064.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01226, ecapa_loss=0.0002774, whisper_loss=0.09724, over 3840828.37 frames. ], batch size: 61, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:00:53,678 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 09:01:04,669 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 09:01:04,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=473270.0, ans=0.2 2024-08-10 09:01:17,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473370.0, ans=0.1 2024-08-10 09:01:31,464 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 09:01:48,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=473570.0, ans=0.125 2024-08-10 09:01:59,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2024-08-10 09:02:01,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=473670.0, ans=0.125 2024-08-10 09:02:04,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3900, loss[loss=0.1143, beats_loss=0.01517, ecapa_loss=0.0002491, whisper_loss=0.09665, over 18802.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01222, ecapa_loss=0.0002777, whisper_loss=0.09735, over 3872130.68 frames. ], batch size: 75, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:02:09,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=473770.0, ans=0.0 2024-08-10 09:02:25,036 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 09:02:38,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+01 3.153e+01 3.691e+01 4.376e+01 6.503e+01, threshold=7.382e+01, percent-clipped=0.0 2024-08-10 09:02:46,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=473970.0, ans=0.125 2024-08-10 09:02:50,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-08-10 09:02:52,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=474070.0, ans=0.125 2024-08-10 09:03:01,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=474070.0, ans=10.0 2024-08-10 09:03:08,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=474170.0, ans=0.125 2024-08-10 09:03:08,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=474170.0, ans=0.1 2024-08-10 09:03:17,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 3950, loss[loss=0.1073, beats_loss=0.01181, ecapa_loss=0.0002834, whisper_loss=0.09267, over 22072.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0122, ecapa_loss=0.0002804, whisper_loss=0.09739, over 3883187.01 frames. ], batch size: 86, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:03:38,127 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 09:03:39,587 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 09:03:44,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=474370.0, ans=0.0 2024-08-10 09:03:46,549 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 12 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 09:03:58,234 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-10 09:03:59,772 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 09:04:06,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-10 09:04:18,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=474670.0, ans=0.125 2024-08-10 09:04:28,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4000, loss[loss=0.1209, beats_loss=0.008156, ecapa_loss=0.0003099, whisper_loss=0.1096, over 19342.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01222, ecapa_loss=0.0002798, whisper_loss=0.09684, over 3867049.43 frames. ], batch size: 72, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:04:33,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=474770.0, ans=0.0 2024-08-10 09:04:38,342 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 09:04:38,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=474770.0, ans=0.125 2024-08-10 09:04:40,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=474770.0, ans=0.015 2024-08-10 09:04:50,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=474870.0, ans=0.0 2024-08-10 09:05:02,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 3.204e+01 3.613e+01 4.111e+01 7.755e+01, threshold=7.226e+01, percent-clipped=1.0 2024-08-10 09:05:02,901 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 09:05:18,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475070.0, ans=0.1 2024-08-10 09:05:32,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=475170.0, ans=0.125 2024-08-10 09:05:32,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2024-08-10 09:05:43,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4050, loss[loss=0.128, beats_loss=0.009753, ecapa_loss=0.0002518, whisper_loss=0.1157, over 23308.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01219, ecapa_loss=0.0002767, whisper_loss=0.09765, over 3913860.19 frames. ], batch size: 90, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:05:51,912 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 16 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 09:05:52,210 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.223e+00 2024-08-10 09:06:32,725 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 09:06:47,081 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 09:06:57,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4100, loss[loss=0.1193, beats_loss=0.01356, ecapa_loss=0.0002338, whisper_loss=0.1034, over 21475.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01222, ecapa_loss=0.0002749, whisper_loss=0.09747, over 3904923.79 frames. ], batch size: 84, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:07:10,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=475770.0, ans=0.0 2024-08-10 09:07:16,553 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 09:07:30,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=475970.0, ans=0.0 2024-08-10 09:07:34,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.441e+01 3.046e+01 3.447e+01 3.852e+01 5.765e+01, threshold=6.895e+01, percent-clipped=0.0 2024-08-10 09:07:47,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=476070.0, ans=0.5 2024-08-10 09:07:52,414 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 09:07:59,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=476170.0, ans=0.125 2024-08-10 09:08:00,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=476170.0, ans=0.0 2024-08-10 09:08:04,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=476170.0, ans=0.0 2024-08-10 09:08:15,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4150, loss[loss=0.1329, beats_loss=0.01076, ecapa_loss=0.0002618, whisper_loss=0.1196, over 19297.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01223, ecapa_loss=0.0002747, whisper_loss=0.09732, over 3919168.19 frames. ], batch size: 75, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:08:15,477 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 09:08:22,966 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-10 09:08:43,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=476370.0, ans=0.125 2024-08-10 09:09:30,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=476670.0, ans=0.1 2024-08-10 09:09:34,922 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 09:09:38,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4200, loss[loss=0.1095, beats_loss=0.0154, ecapa_loss=0.0002354, whisper_loss=0.09172, over 22064.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01233, ecapa_loss=0.0002737, whisper_loss=0.09705, over 3942139.48 frames. ], batch size: 86, lr: 1.55e-02, grad_scale: 268435456.0 2024-08-10 09:09:46,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=476770.0, ans=0.0 2024-08-10 09:10:10,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=476970.0, ans=0.1 2024-08-10 09:10:11,833 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 09:10:12,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.087e+01 3.164e+01 3.633e+01 4.360e+01 6.348e+01, threshold=7.265e+01, percent-clipped=0.0 2024-08-10 09:10:15,111 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 09:10:18,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=476970.0, ans=0.04949747468305833 2024-08-10 09:10:23,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=477070.0, ans=0.2 2024-08-10 09:10:28,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=477070.0, ans=0.1 2024-08-10 09:10:40,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=477170.0, ans=0.1 2024-08-10 09:10:46,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=477170.0, ans=0.95 2024-08-10 09:10:54,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477170.0, ans=0.0 2024-08-10 09:10:55,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477270.0, ans=0.1 2024-08-10 09:10:56,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4250, loss[loss=0.1035, beats_loss=0.0162, ecapa_loss=0.0002159, whisper_loss=0.08512, over 23195.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.0124, ecapa_loss=0.0002711, whisper_loss=0.09639, over 3950380.01 frames. ], batch size: 93, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:11:52,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=477570.0, ans=0.125 2024-08-10 09:11:52,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=12.0 2024-08-10 09:12:10,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=477670.0, ans=0.0 2024-08-10 09:12:16,165 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4300, loss[loss=0.09698, beats_loss=0.01137, ecapa_loss=0.0002356, whisper_loss=0.08326, over 22441.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01238, ecapa_loss=0.0002729, whisper_loss=0.09559, over 3935867.50 frames. ], batch size: 85, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:12:22,254 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 09:12:40,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=477870.0, ans=0.2 2024-08-10 09:12:54,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.896e+01 3.194e+01 3.711e+01 5.609e+01, threshold=6.388e+01, percent-clipped=0.0 2024-08-10 09:13:01,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=477970.0, ans=0.1 2024-08-10 09:13:17,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=12.0 2024-08-10 09:13:21,029 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 09:13:34,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4350, loss[loss=0.1182, beats_loss=0.01206, ecapa_loss=0.0002633, whisper_loss=0.1035, over 20529.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01224, ecapa_loss=0.0002768, whisper_loss=0.09567, over 3908739.46 frames. ], batch size: 80, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:13:36,454 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 09:13:41,580 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 09:13:49,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.24 vs. limit=10.0 2024-08-10 09:13:52,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.40 vs. limit=22.5 2024-08-10 09:13:54,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=478370.0, ans=0.0 2024-08-10 09:13:59,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-10 09:13:59,939 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 09:14:08,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-10 09:14:17,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=478470.0, ans=0.125 2024-08-10 09:14:28,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=8.0 2024-08-10 09:14:38,854 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 09:14:45,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=478670.0, ans=0.125 2024-08-10 09:14:48,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=478670.0, ans=0.125 2024-08-10 09:14:51,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4400, loss[loss=0.1017, beats_loss=0.0151, ecapa_loss=0.0002408, whisper_loss=0.08423, over 20557.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01224, ecapa_loss=0.0002757, whisper_loss=0.09616, over 3881706.42 frames. ], batch size: 84, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:15:02,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-10 09:15:15,136 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 09:15:25,499 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+01 3.041e+01 3.447e+01 3.976e+01 9.860e+01, threshold=6.894e+01, percent-clipped=1.0 2024-08-10 09:15:32,167 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 09:15:42,706 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 09:15:48,092 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 09:16:04,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4450, loss[loss=0.1211, beats_loss=0.01197, ecapa_loss=0.0002785, whisper_loss=0.1063, over 22566.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01224, ecapa_loss=0.0002781, whisper_loss=0.09615, over 3876137.41 frames. ], batch size: 91, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:16:17,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=479370.0, ans=0.0 2024-08-10 09:16:22,724 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 09:16:28,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=479370.0, ans=0.0 2024-08-10 09:16:57,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479670.0, ans=0.1 2024-08-10 09:17:08,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=479670.0, ans=0.125 2024-08-10 09:17:11,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4500, loss[loss=0.1151, beats_loss=0.0115, ecapa_loss=0.0003015, whisper_loss=0.1006, over 21066.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01216, ecapa_loss=0.0002791, whisper_loss=0.09673, over 3857364.69 frames. ], batch size: 82, lr: 1.54e-02, grad_scale: 268435456.0 2024-08-10 09:17:26,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=479870.0, ans=0.0 2024-08-10 09:17:31,460 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 09:17:32,774 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 09:17:34,039 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 09:17:35,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=479870.0, ans=0.0 2024-08-10 09:17:40,259 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-48000.pt 2024-08-10 09:17:44,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.468e+01 3.224e+01 3.675e+01 4.252e+01 6.669e+01, threshold=7.350e+01, percent-clipped=1.0 2024-08-10 09:18:05,408 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 09:18:08,309 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 09:18:10,044 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.033e+01 2024-08-10 09:18:10,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=480170.0, ans=0.125 2024-08-10 09:18:11,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=480170.0, ans=0.125 2024-08-10 09:18:15,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=480170.0, ans=0.0 2024-08-10 09:18:19,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=480270.0, ans=0.07 2024-08-10 09:18:20,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4550, loss[loss=0.1296, beats_loss=0.009912, ecapa_loss=0.0003029, whisper_loss=0.1167, over 18220.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01205, ecapa_loss=0.0002785, whisper_loss=0.09814, over 3885050.95 frames. ], batch size: 69, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:18:23,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=480270.0, ans=0.125 2024-08-10 09:19:01,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=480570.0, ans=0.125 2024-08-10 09:19:15,820 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 09:19:21,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=480670.0, ans=0.125 2024-08-10 09:19:27,565 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4600, loss[loss=0.08505, beats_loss=0.01673, ecapa_loss=0.0002317, whisper_loss=0.06601, over 22529.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01212, ecapa_loss=0.0002778, whisper_loss=0.09764, over 3912544.31 frames. ], batch size: 94, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:19:31,575 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 41 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 09:19:35,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480770.0, ans=0.1 2024-08-10 09:19:41,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=480870.0, ans=0.125 2024-08-10 09:19:51,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=480870.0, ans=0.2 2024-08-10 09:19:53,297 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 09:19:56,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-10 09:19:56,960 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 3.144e+01 3.622e+01 4.296e+01 6.398e+01, threshold=7.244e+01, percent-clipped=0.0 2024-08-10 09:20:04,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=480970.0, ans=0.125 2024-08-10 09:20:09,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481070.0, ans=0.1 2024-08-10 09:20:11,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=481070.0, ans=0.2 2024-08-10 09:20:17,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=481070.0, ans=0.0 2024-08-10 09:20:21,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.17 vs. limit=15.0 2024-08-10 09:20:22,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481170.0, ans=0.125 2024-08-10 09:20:26,264 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 09:20:29,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.62 vs. limit=22.5 2024-08-10 09:20:31,982 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 09:20:32,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4650, loss[loss=0.1187, beats_loss=0.01225, ecapa_loss=0.000292, whisper_loss=0.1035, over 21076.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.0121, ecapa_loss=0.0002783, whisper_loss=0.09752, over 3896189.23 frames. ], batch size: 89, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:20:39,241 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 09:20:39,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.37 vs. limit=22.5 2024-08-10 09:20:43,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=481270.0, ans=0.1 2024-08-10 09:20:46,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481370.0, ans=0.1 2024-08-10 09:21:00,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2024-08-10 09:21:02,718 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 09:21:05,391 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 09:21:10,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=481570.0, ans=0.125 2024-08-10 09:21:19,280 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 09:21:28,530 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 09:21:30,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2024-08-10 09:21:37,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4700, loss[loss=0.1249, beats_loss=0.007389, ecapa_loss=0.0003147, whisper_loss=0.1143, over 18464.00 frames. ], tot_loss[loss=0.113, beats_loss=0.0121, ecapa_loss=0.0002778, whisper_loss=0.09811, over 3897731.35 frames. ], batch size: 71, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:21:41,165 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 09:21:43,933 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 09:21:46,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=481770.0, ans=0.125 2024-08-10 09:21:48,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-10 09:21:56,555 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-10 09:22:07,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.494e+01 3.095e+01 3.461e+01 3.864e+01 6.358e+01, threshold=6.922e+01, percent-clipped=0.0 2024-08-10 09:22:10,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2024-08-10 09:22:27,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=482070.0, ans=0.125 2024-08-10 09:22:42,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4750, loss[loss=0.1019, beats_loss=0.01516, ecapa_loss=0.0002632, whisper_loss=0.0841, over 18712.00 frames. ], tot_loss[loss=0.1134, beats_loss=0.01209, ecapa_loss=0.0002761, whisper_loss=0.09852, over 3881900.47 frames. ], batch size: 75, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:23:08,042 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 09:23:10,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=482470.0, ans=0.125 2024-08-10 09:23:18,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=482470.0, ans=0.0 2024-08-10 09:23:21,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=482570.0, ans=0.2 2024-08-10 09:23:23,230 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 09:23:33,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=482670.0, ans=0.125 2024-08-10 09:23:39,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=482670.0, ans=0.125 2024-08-10 09:23:45,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4800, loss[loss=0.1211, beats_loss=0.01128, ecapa_loss=0.000271, whisper_loss=0.1071, over 22579.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01215, ecapa_loss=0.000277, whisper_loss=0.09785, over 3891656.73 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:24:04,947 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-10 09:24:10,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=482970.0, ans=0.05 2024-08-10 09:24:11,313 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 09:24:11,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=482970.0, ans=0.025 2024-08-10 09:24:14,810 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+01 3.085e+01 3.419e+01 4.209e+01 9.011e+01, threshold=6.838e+01, percent-clipped=2.0 2024-08-10 09:24:21,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2024-08-10 09:24:22,874 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 09:24:29,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-08-10 09:24:49,341 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4850, loss[loss=0.1019, beats_loss=0.01227, ecapa_loss=0.000312, whisper_loss=0.08652, over 15402.00 frames. ], tot_loss[loss=0.1128, beats_loss=0.01226, ecapa_loss=0.0002762, whisper_loss=0.09776, over 3920075.39 frames. ], batch size: 66, lr: 1.54e-02, grad_scale: 536870912.0 2024-08-10 09:24:52,956 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 09:25:11,595 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-10 09:25:21,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.43 vs. limit=10.0 2024-08-10 09:25:27,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=483470.0, ans=0.2 2024-08-10 09:25:32,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-08-10 09:25:40,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483570.0, ans=0.1 2024-08-10 09:25:54,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=483670.0, ans=0.125 2024-08-10 09:26:01,244 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 09:26:02,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4900, loss[loss=0.1043, beats_loss=0.01261, ecapa_loss=0.0002874, whisper_loss=0.0888, over 20516.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01223, ecapa_loss=0.0002759, whisper_loss=0.09752, over 3883520.79 frames. ], batch size: 81, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:26:16,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=483870.0, ans=0.125 2024-08-10 09:26:19,487 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 09:26:39,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 3.191e+01 3.639e+01 4.118e+01 6.849e+01, threshold=7.278e+01, percent-clipped=1.0 2024-08-10 09:26:46,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-08-10 09:27:10,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=484170.0, ans=0.125 2024-08-10 09:27:16,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=484170.0, ans=0.2 2024-08-10 09:27:28,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2024-08-10 09:27:29,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 4950, loss[loss=0.059, beats_loss=0.01707, ecapa_loss=0.000198, whisper_loss=0.03996, over 13680.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.0122, ecapa_loss=0.0002758, whisper_loss=0.09727, over 3871324.21 frames. ], batch size: 54, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:27:44,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=484270.0, ans=0.0 2024-08-10 09:27:48,331 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 09:28:01,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2024-08-10 09:28:19,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=484470.0, ans=0.125 2024-08-10 09:28:20,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2024-08-10 09:28:34,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-08-10 09:28:35,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=484570.0, ans=0.125 2024-08-10 09:28:52,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=484670.0, ans=0.125 2024-08-10 09:28:53,310 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.625e-03 2024-08-10 09:28:58,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=22.5 2024-08-10 09:29:06,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5000, loss[loss=0.1367, beats_loss=0.01212, ecapa_loss=0.0002116, whisper_loss=0.1225, over 22128.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01222, ecapa_loss=0.0002739, whisper_loss=0.0975, over 3864729.64 frames. ], batch size: 80, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:29:11,563 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 09:29:26,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-10 09:29:42,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=484870.0, ans=12.0 2024-08-10 09:29:52,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+01 3.034e+01 3.424e+01 4.085e+01 5.403e+01, threshold=6.848e+01, percent-clipped=0.0 2024-08-10 09:30:03,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.81 vs. limit=22.5 2024-08-10 09:30:04,763 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 09:30:12,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=485070.0, ans=0.125 2024-08-10 09:30:18,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=485070.0, ans=0.125 2024-08-10 09:30:23,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=485170.0, ans=10.0 2024-08-10 09:30:35,681 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 09:30:39,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-08-10 09:30:44,498 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5050, loss[loss=0.1013, beats_loss=0.01548, ecapa_loss=0.0002346, whisper_loss=0.08344, over 22842.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01227, ecapa_loss=0.0002733, whisper_loss=0.0971, over 3890837.90 frames. ], batch size: 91, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:30:53,883 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 09:30:58,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=485270.0, ans=0.125 2024-08-10 09:31:02,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=485370.0, ans=0.125 2024-08-10 09:31:12,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=485370.0, ans=0.0 2024-08-10 09:31:13,259 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 09:31:21,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2024-08-10 09:31:24,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=485470.0, ans=0.125 2024-08-10 09:31:35,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.69 vs. limit=22.5 2024-08-10 09:31:53,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=485570.0, ans=0.0 2024-08-10 09:32:16,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5100, loss[loss=0.1144, beats_loss=0.01225, ecapa_loss=0.0003092, whisper_loss=0.09904, over 21540.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01222, ecapa_loss=0.0002748, whisper_loss=0.09804, over 3927744.08 frames. ], batch size: 90, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:32:20,320 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 09:32:23,822 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 32 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 09:32:23,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485770.0, ans=0.1 2024-08-10 09:32:40,666 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 09:32:45,201 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.495e+01 3.245e+01 3.767e+01 4.403e+01 1.091e+02, threshold=7.533e+01, percent-clipped=4.0 2024-08-10 09:33:05,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-08-10 09:33:08,948 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-10 09:33:20,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5150, loss[loss=0.1076, beats_loss=0.01263, ecapa_loss=0.0003188, whisper_loss=0.09175, over 17631.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01226, ecapa_loss=0.0002733, whisper_loss=0.09729, over 3939190.22 frames. ], batch size: 74, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:33:30,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=486270.0, ans=0.1 2024-08-10 09:33:31,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=486370.0, ans=0.125 2024-08-10 09:33:35,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=486370.0, ans=0.125 2024-08-10 09:33:38,181 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.674e+00 2024-08-10 09:33:45,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486470.0, ans=0.1 2024-08-10 09:33:45,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=486470.0, ans=0.125 2024-08-10 09:33:58,006 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 09:33:59,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=486570.0, ans=0.125 2024-08-10 09:34:08,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=486570.0, ans=0.125 2024-08-10 09:34:21,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486770.0, ans=0.125 2024-08-10 09:34:22,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5200, loss[loss=0.1084, beats_loss=0.0127, ecapa_loss=0.0002678, whisper_loss=0.09301, over 22872.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01221, ecapa_loss=0.0002737, whisper_loss=0.09701, over 3915457.97 frames. ], batch size: 88, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:34:38,482 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 09:34:51,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+01 3.041e+01 3.408e+01 4.043e+01 9.843e+01, threshold=6.816e+01, percent-clipped=1.0 2024-08-10 09:35:09,974 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-10 09:35:24,826 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 09:35:25,919 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5250, loss[loss=0.09532, beats_loss=0.01457, ecapa_loss=0.0003373, whisper_loss=0.07738, over 19950.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01222, ecapa_loss=0.0002741, whisper_loss=0.09623, over 3907243.29 frames. ], batch size: 90, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:35:56,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=487470.0, ans=0.5 2024-08-10 09:36:01,668 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 09:36:06,519 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-10 09:36:12,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487570.0, ans=0.1 2024-08-10 09:36:15,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=487570.0, ans=0.125 2024-08-10 09:36:17,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=487670.0, ans=0.0 2024-08-10 09:36:21,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-10 09:36:21,969 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 09:36:29,509 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5300, loss[loss=0.103, beats_loss=0.01132, ecapa_loss=0.0002991, whisper_loss=0.08869, over 20165.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01214, ecapa_loss=0.000275, whisper_loss=0.0965, over 3890293.91 frames. ], batch size: 82, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:36:31,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=487770.0, ans=0.0 2024-08-10 09:36:37,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=487770.0, ans=0.0 2024-08-10 09:36:40,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=487770.0, ans=0.015 2024-08-10 09:36:53,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=487970.0, ans=0.2 2024-08-10 09:36:56,210 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 09:36:58,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+01 3.142e+01 3.530e+01 4.338e+01 6.802e+01, threshold=7.061e+01, percent-clipped=0.0 2024-08-10 09:37:23,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=488170.0, ans=0.0 2024-08-10 09:37:29,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=488170.0, ans=0.125 2024-08-10 09:37:32,238 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 09:37:33,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5350, loss[loss=0.1042, beats_loss=0.01243, ecapa_loss=0.0002538, whisper_loss=0.08922, over 16564.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01216, ecapa_loss=0.0002735, whisper_loss=0.09643, over 3884918.11 frames. ], batch size: 64, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:37:39,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=488270.0, ans=0.125 2024-08-10 09:37:50,305 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.766e+00 2024-08-10 09:38:00,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488470.0, ans=0.1 2024-08-10 09:38:17,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=488570.0, ans=0.0 2024-08-10 09:38:20,703 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 09:38:36,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5400, loss[loss=0.1172, beats_loss=0.01147, ecapa_loss=0.0002818, whisper_loss=0.1029, over 20887.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01215, ecapa_loss=0.0002721, whisper_loss=0.09651, over 3897898.56 frames. ], batch size: 83, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:38:36,872 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 09:38:58,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=488870.0, ans=0.2 2024-08-10 09:38:59,316 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.954e-02 2024-08-10 09:39:05,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.881e+01 3.134e+01 3.602e+01 5.252e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-10 09:39:11,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=488970.0, ans=0.125 2024-08-10 09:39:11,990 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 09:39:17,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=489070.0, ans=0.025 2024-08-10 09:39:23,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=489070.0, ans=0.125 2024-08-10 09:39:25,980 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 09:39:33,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=489170.0, ans=0.125 2024-08-10 09:39:39,760 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5450, loss[loss=0.1105, beats_loss=0.01565, ecapa_loss=0.0002826, whisper_loss=0.092, over 21296.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01223, ecapa_loss=0.000272, whisper_loss=0.09672, over 3911890.97 frames. ], batch size: 87, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:39:49,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=489270.0, ans=0.0 2024-08-10 09:40:03,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-08-10 09:40:34,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=489670.0, ans=0.5 2024-08-10 09:40:43,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5500, loss[loss=0.1071, beats_loss=0.01404, ecapa_loss=0.0002015, whisper_loss=0.09101, over 18385.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01217, ecapa_loss=0.0002713, whisper_loss=0.09735, over 3927039.99 frames. ], batch size: 72, lr: 1.53e-02, grad_scale: 536870912.0 2024-08-10 09:40:48,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-08-10 09:40:53,559 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 09:41:00,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=489870.0, ans=0.0 2024-08-10 09:41:07,476 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 09:41:07,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=489970.0, ans=0.2 2024-08-10 09:41:10,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=12.0 2024-08-10 09:41:12,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 3.173e+01 3.591e+01 4.081e+01 1.350e+02, threshold=7.183e+01, percent-clipped=1.0 2024-08-10 09:41:26,235 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-10 09:41:26,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.28 vs. limit=10.0 2024-08-10 09:41:33,990 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 09:41:47,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5550, loss[loss=0.1031, beats_loss=0.01124, ecapa_loss=0.0003164, whisper_loss=0.08872, over 21856.00 frames. ], tot_loss[loss=0.1126, beats_loss=0.01212, ecapa_loss=0.0002737, whisper_loss=0.09779, over 3945241.97 frames. ], batch size: 91, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:41:58,189 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 09:42:06,026 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 09:42:08,637 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 09:42:22,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=490470.0, ans=0.125 2024-08-10 09:42:27,647 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 09:42:41,719 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 41 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 09:42:47,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490670.0, ans=0.1 2024-08-10 09:42:51,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5600, loss[loss=0.0981, beats_loss=0.01299, ecapa_loss=0.0002849, whisper_loss=0.08226, over 21825.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01209, ecapa_loss=0.000275, whisper_loss=0.09783, over 3959853.11 frames. ], batch size: 93, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:42:53,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2024-08-10 09:43:08,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=490870.0, ans=0.125 2024-08-10 09:43:19,626 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 09:43:20,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 3.015e+01 3.404e+01 4.297e+01 6.726e+01, threshold=6.809e+01, percent-clipped=0.0 2024-08-10 09:43:23,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=490970.0, ans=0.2 2024-08-10 09:43:26,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=490970.0, ans=0.0 2024-08-10 09:43:28,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=491070.0, ans=0.0 2024-08-10 09:43:34,038 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 09:43:36,446 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 09:43:37,966 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 09:43:42,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=491170.0, ans=0.0 2024-08-10 09:43:50,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=491170.0, ans=0.125 2024-08-10 09:43:55,735 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5650, loss[loss=0.09866, beats_loss=0.01209, ecapa_loss=0.0002925, whisper_loss=0.08364, over 17607.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01215, ecapa_loss=0.0002736, whisper_loss=0.09735, over 3953136.20 frames. ], batch size: 71, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:43:55,933 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 09:44:25,204 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 09:44:25,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=491470.0, ans=0.0 2024-08-10 09:44:29,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=491470.0, ans=0.2 2024-08-10 09:44:31,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=491470.0, ans=10.0 2024-08-10 09:44:56,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-08-10 09:44:56,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-08-10 09:44:59,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5700, loss[loss=0.1062, beats_loss=0.01432, ecapa_loss=0.0002205, whisper_loss=0.08969, over 18567.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01213, ecapa_loss=0.0002734, whisper_loss=0.09787, over 3965166.17 frames. ], batch size: 74, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:45:01,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491770.0, ans=0.1 2024-08-10 09:45:18,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=491870.0, ans=0.125 2024-08-10 09:45:20,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=491870.0, ans=0.0 2024-08-10 09:45:33,068 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.196e+01 3.076e+01 3.438e+01 4.149e+01 8.224e+01, threshold=6.876e+01, percent-clipped=3.0 2024-08-10 09:45:35,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2024-08-10 09:45:39,301 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 09:45:44,084 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 09:45:50,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=492070.0, ans=0.125 2024-08-10 09:46:03,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=492170.0, ans=0.2 2024-08-10 09:46:14,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5750, loss[loss=0.1037, beats_loss=0.01299, ecapa_loss=0.0002746, whisper_loss=0.088, over 21244.00 frames. ], tot_loss[loss=0.1124, beats_loss=0.01219, ecapa_loss=0.0002732, whisper_loss=0.09748, over 3932742.01 frames. ], batch size: 86, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:46:22,953 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 09:46:42,727 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 33 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 09:46:44,134 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 09:46:47,997 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.683e+05 2024-08-10 09:46:54,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=492470.0, ans=0.125 2024-08-10 09:46:57,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-10 09:47:07,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=492570.0, ans=0.0 2024-08-10 09:47:28,881 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 09:47:29,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=492670.0, ans=0.0 2024-08-10 09:47:37,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5800, loss[loss=0.12, beats_loss=0.0124, ecapa_loss=0.0002408, whisper_loss=0.1052, over 19493.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01216, ecapa_loss=0.0002732, whisper_loss=0.09781, over 3903025.22 frames. ], batch size: 75, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:47:45,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=492770.0, ans=0.125 2024-08-10 09:47:54,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=492870.0, ans=0.125 2024-08-10 09:48:11,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+01 3.153e+01 3.469e+01 4.030e+01 1.339e+02, threshold=6.938e+01, percent-clipped=1.0 2024-08-10 09:48:12,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=492970.0, ans=0.0 2024-08-10 09:48:23,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=493070.0, ans=6.0 2024-08-10 09:48:47,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5850, loss[loss=0.1253, beats_loss=0.01302, ecapa_loss=0.0002493, whisper_loss=0.1098, over 19591.00 frames. ], tot_loss[loss=0.1133, beats_loss=0.01211, ecapa_loss=0.0002724, whisper_loss=0.09844, over 3919265.22 frames. ], batch size: 77, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:48:48,016 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 09:48:56,859 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 09:48:57,098 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.221e-01 2024-08-10 09:49:06,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2024-08-10 09:49:14,623 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 09:49:21,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-10 09:49:25,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 09:49:28,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=493570.0, ans=0.05 2024-08-10 09:49:51,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5900, loss[loss=0.1159, beats_loss=0.01389, ecapa_loss=0.0002033, whisper_loss=0.1, over 20133.00 frames. ], tot_loss[loss=0.1127, beats_loss=0.01219, ecapa_loss=0.0002704, whisper_loss=0.09779, over 3920119.95 frames. ], batch size: 79, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:50:08,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=493870.0, ans=0.125 2024-08-10 09:50:09,297 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 09:50:15,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=493970.0, ans=0.04949747468305833 2024-08-10 09:50:20,478 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+01 2.982e+01 3.256e+01 3.844e+01 1.503e+02, threshold=6.513e+01, percent-clipped=1.0 2024-08-10 09:50:36,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494070.0, ans=0.1 2024-08-10 09:50:54,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 5950, loss[loss=0.1137, beats_loss=0.009676, ecapa_loss=0.0002954, whisper_loss=0.1011, over 18523.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01219, ecapa_loss=0.0002702, whisper_loss=0.09741, over 3938443.99 frames. ], batch size: 73, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:50:56,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=494270.0, ans=0.125 2024-08-10 09:51:02,741 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 09:51:12,974 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.569e-03 2024-08-10 09:51:16,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=494370.0, ans=0.125 2024-08-10 09:51:33,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=494570.0, ans=0.125 2024-08-10 09:51:44,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2024-08-10 09:51:56,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=494670.0, ans=0.125 2024-08-10 09:51:58,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6000, loss[loss=0.1472, beats_loss=0.009389, ecapa_loss=0.0002467, whisper_loss=0.1354, over 22182.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01217, ecapa_loss=0.0002694, whisper_loss=0.09819, over 3937830.61 frames. ], batch size: 81, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:51:58,748 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 09:52:40,006 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on ASR_libri: loss=0.2669, beats_loss=0, ecapa_loss=0.0008114, whisper_loss=0.2588, over 922467.00 frames. 2024-08-10 09:52:55,600 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on SV_voxceleb1: loss=0.00707, beats_loss=0, ecapa_loss=0.000707, whisper_loss=0, over 939242.00 frames. 2024-08-10 09:54:53,690 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on AT_audioset: loss=0.028, beats_loss=0.028, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 09:54:53,695 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 09:55:04,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=494770.0, ans=0.125 2024-08-10 09:55:14,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-10 09:55:18,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2024-08-10 09:55:23,066 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+01 2.994e+01 3.624e+01 4.180e+01 6.998e+01, threshold=7.249e+01, percent-clipped=2.0 2024-08-10 09:55:28,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=494970.0, ans=0.0 2024-08-10 09:55:29,571 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 09:55:32,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=495070.0, ans=0.2 2024-08-10 09:55:58,190 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6050, loss[loss=0.1103, beats_loss=0.01083, ecapa_loss=0.000241, whisper_loss=0.09705, over 16216.00 frames. ], tot_loss[loss=0.113, beats_loss=0.01213, ecapa_loss=0.0002693, whisper_loss=0.09817, over 3919217.24 frames. ], batch size: 60, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:56:09,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=495370.0, ans=0.2 2024-08-10 09:56:14,892 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 09:56:16,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=495370.0, ans=0.125 2024-08-10 09:56:20,401 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=12.0 2024-08-10 09:56:22,625 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 09:56:35,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=495570.0, ans=0.125 2024-08-10 09:56:41,852 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 09:56:50,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2024-08-10 09:57:02,739 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6100, loss[loss=0.09812, beats_loss=0.01345, ecapa_loss=0.0002455, whisper_loss=0.08221, over 14615.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01218, ecapa_loss=0.0002701, whisper_loss=0.09736, over 3916829.94 frames. ], batch size: 58, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:57:05,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=12.0 2024-08-10 09:57:09,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-10 09:57:16,564 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 09:57:22,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=495870.0, ans=0.0 2024-08-10 09:57:30,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495970.0, ans=0.1 2024-08-10 09:57:31,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.767e+01 3.161e+01 3.682e+01 7.056e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 09:58:01,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.17 vs. limit=22.5 2024-08-10 09:58:06,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2024-08-10 09:58:06,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6150, loss[loss=0.08092, beats_loss=0.01379, ecapa_loss=0.0003107, whisper_loss=0.06403, over 16560.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01222, ecapa_loss=0.0002703, whisper_loss=0.09675, over 3914756.55 frames. ], batch size: 69, lr: 1.52e-02, grad_scale: 536870912.0 2024-08-10 09:58:11,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-10 09:58:21,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-10 09:58:42,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=496470.0, ans=0.125 2024-08-10 09:59:02,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=496670.0, ans=0.0 2024-08-10 09:59:10,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6200, loss[loss=0.1025, beats_loss=0.01563, ecapa_loss=0.0001948, whisper_loss=0.08495, over 23009.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01217, ecapa_loss=0.0002718, whisper_loss=0.09714, over 3914628.81 frames. ], batch size: 92, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 09:59:12,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496770.0, ans=0.0 2024-08-10 09:59:13,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496770.0, ans=0.1 2024-08-10 09:59:25,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=496870.0, ans=0.2 2024-08-10 09:59:32,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-10 09:59:40,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.342e+01 3.143e+01 3.568e+01 4.018e+01 6.093e+01, threshold=7.137e+01, percent-clipped=0.0 2024-08-10 09:59:46,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496970.0, ans=0.1 2024-08-10 09:59:54,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=497070.0, ans=0.125 2024-08-10 10:00:13,661 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 10:00:16,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6250, loss[loss=0.09257, beats_loss=0.01423, ecapa_loss=0.0002951, whisper_loss=0.07539, over 21201.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01214, ecapa_loss=0.0002715, whisper_loss=0.09676, over 3909106.74 frames. ], batch size: 93, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:00:19,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=497270.0, ans=0.125 2024-08-10 10:00:47,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=497470.0, ans=0.125 2024-08-10 10:00:57,879 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:00:59,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-10 10:01:06,622 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 10:01:10,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=497670.0, ans=0.125 2024-08-10 10:01:16,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=497670.0, ans=0.125 2024-08-10 10:01:20,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6300, loss[loss=0.09892, beats_loss=0.01689, ecapa_loss=0.0002454, whisper_loss=0.07957, over 20951.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0122, ecapa_loss=0.0002703, whisper_loss=0.09684, over 3908457.33 frames. ], batch size: 88, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:01:21,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-08-10 10:01:24,828 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 10:01:41,712 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 10:01:42,922 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 10:01:45,538 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 10:01:50,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+01 3.096e+01 3.544e+01 4.139e+01 6.723e+01, threshold=7.089e+01, percent-clipped=0.0 2024-08-10 10:01:57,722 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 10:02:08,060 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:02:23,412 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 10:02:24,788 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 10:02:25,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6350, loss[loss=0.1076, beats_loss=0.009696, ecapa_loss=0.0002332, whisper_loss=0.09556, over 16804.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01213, ecapa_loss=0.0002711, whisper_loss=0.09706, over 3892942.67 frames. ], batch size: 64, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:02:30,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-10 10:02:34,700 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 10:02:41,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=498370.0, ans=0.125 2024-08-10 10:02:42,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=498370.0, ans=0.0 2024-08-10 10:02:52,619 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-10 10:03:00,486 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-10 10:03:01,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=498470.0, ans=0.05 2024-08-10 10:03:03,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=498570.0, ans=0.07 2024-08-10 10:03:12,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=498570.0, ans=0.0 2024-08-10 10:03:18,440 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 10:03:29,895 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6400, loss[loss=0.09781, beats_loss=0.01202, ecapa_loss=0.0002491, whisper_loss=0.08331, over 16484.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01215, ecapa_loss=0.0002683, whisper_loss=0.09716, over 3900131.83 frames. ], batch size: 64, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:03:33,164 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 10:03:52,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498870.0, ans=0.1 2024-08-10 10:03:53,487 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 10:04:01,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+01 3.035e+01 3.531e+01 4.097e+01 5.944e+01, threshold=7.062e+01, percent-clipped=0.0 2024-08-10 10:04:01,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498970.0, ans=0.1 2024-08-10 10:04:22,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=499070.0, ans=0.125 2024-08-10 10:04:43,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6450, loss[loss=0.12, beats_loss=0.01164, ecapa_loss=0.0003167, whisper_loss=0.1052, over 22855.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01223, ecapa_loss=0.0002688, whisper_loss=0.09701, over 3915527.83 frames. ], batch size: 92, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:04:45,121 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 10:05:09,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=499370.0, ans=0.07 2024-08-10 10:05:11,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499370.0, ans=0.1 2024-08-10 10:05:11,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499370.0, ans=0.1 2024-08-10 10:05:18,582 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 10:05:24,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=499470.0, ans=0.125 2024-08-10 10:05:35,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=499570.0, ans=0.125 2024-08-10 10:05:38,585 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 10:05:43,030 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 10:05:44,212 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 10:05:53,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=499670.0, ans=0.2 2024-08-10 10:05:58,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6500, loss[loss=0.1083, beats_loss=0.01322, ecapa_loss=0.0002696, whisper_loss=0.09243, over 22539.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.01215, ecapa_loss=0.0002684, whisper_loss=0.09744, over 3881006.24 frames. ], batch size: 93, lr: 1.51e-02, grad_scale: 536870912.0 2024-08-10 10:06:11,644 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-10 10:06:28,651 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-10 10:06:33,924 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 3.134e+01 3.492e+01 3.881e+01 6.321e+01, threshold=6.984e+01, percent-clipped=0.0 2024-08-10 10:06:41,089 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 10:06:47,928 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 10:06:48,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=500070.0, ans=0.015 2024-08-10 10:06:55,733 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 10:07:15,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6550, loss[loss=0.1119, beats_loss=0.01418, ecapa_loss=0.0002976, whisper_loss=0.09478, over 21069.00 frames. ], tot_loss[loss=0.1129, beats_loss=0.01216, ecapa_loss=0.0002688, whisper_loss=0.09804, over 3897842.57 frames. ], batch size: 90, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:07:35,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=500370.0, ans=0.125 2024-08-10 10:07:38,540 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 10:07:44,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500370.0, ans=0.1 2024-08-10 10:07:46,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-10 10:08:23,680 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 10:08:34,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=500670.0, ans=0.0 2024-08-10 10:08:41,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6600, loss[loss=0.1192, beats_loss=0.007886, ecapa_loss=0.0003441, whisper_loss=0.1078, over 20090.00 frames. ], tot_loss[loss=0.1135, beats_loss=0.01209, ecapa_loss=0.0002722, whisper_loss=0.09868, over 3927401.72 frames. ], batch size: 80, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:08:43,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=500770.0, ans=0.125 2024-08-10 10:08:52,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=500770.0, ans=0.125 2024-08-10 10:09:12,549 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 10:09:12,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500970.0, ans=0.1 2024-08-10 10:09:18,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 3.113e+01 3.580e+01 3.995e+01 6.180e+01, threshold=7.160e+01, percent-clipped=0.0 2024-08-10 10:09:19,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-08-10 10:09:21,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=500970.0, ans=0.125 2024-08-10 10:09:46,322 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-10 10:10:00,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6650, loss[loss=0.1221, beats_loss=0.01105, ecapa_loss=0.0002239, whisper_loss=0.1088, over 17906.00 frames. ], tot_loss[loss=0.1131, beats_loss=0.01213, ecapa_loss=0.0002715, whisper_loss=0.09826, over 3920809.00 frames. ], batch size: 64, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:10:01,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501270.0, ans=0.1 2024-08-10 10:10:16,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=501370.0, ans=0.0 2024-08-10 10:10:22,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=501370.0, ans=0.05 2024-08-10 10:10:33,967 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 10:10:35,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=501470.0, ans=0.05 2024-08-10 10:10:55,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=501570.0, ans=0.125 2024-08-10 10:11:01,100 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-10 10:11:06,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=501670.0, ans=0.125 2024-08-10 10:11:19,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501670.0, ans=0.125 2024-08-10 10:11:21,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6700, loss[loss=0.1027, beats_loss=0.01098, ecapa_loss=0.0003127, whisper_loss=0.0886, over 13408.00 frames. ], tot_loss[loss=0.112, beats_loss=0.01218, ecapa_loss=0.0002712, whisper_loss=0.09711, over 3885579.81 frames. ], batch size: 53, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:11:33,975 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 10:11:38,097 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 10:11:40,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=501870.0, ans=0.0 2024-08-10 10:12:00,465 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 2.966e+01 3.489e+01 3.963e+01 6.232e+01, threshold=6.977e+01, percent-clipped=0.0 2024-08-10 10:12:13,284 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 32 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 10:12:13,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=502070.0, ans=0.0 2024-08-10 10:12:16,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-08-10 10:12:29,237 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 10:12:45,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6750, loss[loss=0.1254, beats_loss=0.01086, ecapa_loss=0.0002643, whisper_loss=0.1119, over 20105.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01221, ecapa_loss=0.0002702, whisper_loss=0.09688, over 3879810.46 frames. ], batch size: 78, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:12:47,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=502270.0, ans=0.2 2024-08-10 10:12:56,504 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 10:12:56,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=502270.0, ans=0.125 2024-08-10 10:13:11,514 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 40 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 10:13:13,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=502370.0, ans=0.2 2024-08-10 10:13:25,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=502470.0, ans=0.125 2024-08-10 10:13:31,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.87 vs. limit=15.0 2024-08-10 10:13:57,370 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 10:14:04,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=502670.0, ans=0.0 2024-08-10 10:14:11,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6800, loss[loss=0.1143, beats_loss=0.01376, ecapa_loss=0.0002114, whisper_loss=0.09846, over 16312.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01215, ecapa_loss=0.0002691, whisper_loss=0.09673, over 3885199.50 frames. ], batch size: 63, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:14:11,336 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 10:14:24,545 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 40 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 10:14:27,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=502870.0, ans=0.125 2024-08-10 10:14:33,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2024-08-10 10:14:36,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=502870.0, ans=0.125 2024-08-10 10:14:41,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=502870.0, ans=0.0 2024-08-10 10:14:43,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=502870.0, ans=0.125 2024-08-10 10:14:50,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.003e+01 3.545e+01 4.063e+01 8.445e+01, threshold=7.089e+01, percent-clipped=2.0 2024-08-10 10:14:54,464 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 10:14:59,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2024-08-10 10:15:31,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=503170.0, ans=15.0 2024-08-10 10:15:35,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6850, loss[loss=0.1134, beats_loss=0.009851, ecapa_loss=0.0002937, whisper_loss=0.1006, over 14612.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01209, ecapa_loss=0.0002704, whisper_loss=0.09676, over 3832468.46 frames. ], batch size: 54, lr: 1.51e-02, grad_scale: 1073741824.0 2024-08-10 10:15:42,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=503270.0, ans=0.125 2024-08-10 10:15:42,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.66 vs. limit=22.5 2024-08-10 10:15:46,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=503270.0, ans=0.125 2024-08-10 10:15:50,502 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-10 10:15:51,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2024-08-10 10:16:09,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=503470.0, ans=0.125 2024-08-10 10:16:09,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=503470.0, ans=0.1 2024-08-10 10:16:09,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2024-08-10 10:16:24,411 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 10:16:46,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=503670.0, ans=0.125 2024-08-10 10:16:54,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6900, loss[loss=0.108, beats_loss=0.01019, ecapa_loss=0.0003783, whisper_loss=0.09405, over 19290.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01208, ecapa_loss=0.0002697, whisper_loss=0.09703, over 3830025.68 frames. ], batch size: 87, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:17:09,659 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 10:17:30,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+01 3.010e+01 3.385e+01 3.920e+01 6.674e+01, threshold=6.771e+01, percent-clipped=0.0 2024-08-10 10:17:34,759 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 10:17:43,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=504070.0, ans=0.125 2024-08-10 10:17:55,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504070.0, ans=0.1 2024-08-10 10:18:14,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 6950, loss[loss=0.1198, beats_loss=0.0105, ecapa_loss=0.0002427, whisper_loss=0.1069, over 14678.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01217, ecapa_loss=0.0002669, whisper_loss=0.09622, over 3846777.31 frames. ], batch size: 53, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:18:44,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=504370.0, ans=0.125 2024-08-10 10:19:17,997 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-10 10:19:25,526 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 10:19:32,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-10 10:19:33,714 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:19:33,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=504670.0, ans=0.0 2024-08-10 10:19:36,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7000, loss[loss=0.1184, beats_loss=0.01351, ecapa_loss=0.0002287, whisper_loss=0.1026, over 23415.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01212, ecapa_loss=0.0002676, whisper_loss=0.09608, over 3820227.10 frames. ], batch size: 94, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:19:36,612 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 10:19:42,544 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 10:19:53,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=504870.0, ans=0.2 2024-08-10 10:19:59,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-10 10:20:12,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.871e+01 3.202e+01 3.824e+01 7.169e+01, threshold=6.405e+01, percent-clipped=1.0 2024-08-10 10:20:16,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-10 10:20:23,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=505070.0, ans=0.125 2024-08-10 10:20:36,875 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 10:20:45,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=505170.0, ans=0.2 2024-08-10 10:20:54,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505170.0, ans=0.125 2024-08-10 10:20:57,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7050, loss[loss=0.1066, beats_loss=0.01129, ecapa_loss=0.0002301, whisper_loss=0.09305, over 16486.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.0121, ecapa_loss=0.0002687, whisper_loss=0.09605, over 3860608.12 frames. ], batch size: 65, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:21:00,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=505270.0, ans=0.025 2024-08-10 10:21:16,716 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 10:21:43,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505570.0, ans=0.125 2024-08-10 10:22:00,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.18 vs. limit=15.0 2024-08-10 10:22:03,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=505670.0, ans=0.07 2024-08-10 10:22:10,637 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-10 10:22:16,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7100, loss[loss=0.08506, beats_loss=0.01588, ecapa_loss=0.0002184, whisper_loss=0.067, over 17759.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01209, ecapa_loss=0.0002668, whisper_loss=0.09585, over 3862619.05 frames. ], batch size: 70, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:22:20,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=505770.0, ans=0.2 2024-08-10 10:22:25,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=505770.0, ans=0.0 2024-08-10 10:22:29,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=505770.0, ans=0.2 2024-08-10 10:22:39,756 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 10:22:40,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=12.0 2024-08-10 10:22:40,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2024-08-10 10:22:54,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.041e+01 3.472e+01 4.120e+01 8.517e+01, threshold=6.943e+01, percent-clipped=2.0 2024-08-10 10:23:07,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=506070.0, ans=0.0 2024-08-10 10:23:20,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=506170.0, ans=0.125 2024-08-10 10:23:23,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=506170.0, ans=0.5 2024-08-10 10:23:25,447 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 10:23:36,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7150, loss[loss=0.1005, beats_loss=0.01402, ecapa_loss=0.0003111, whisper_loss=0.08332, over 21317.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01209, ecapa_loss=0.0002675, whisper_loss=0.09624, over 3878210.32 frames. ], batch size: 90, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:23:40,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=506270.0, ans=0.04949747468305833 2024-08-10 10:23:46,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=506270.0, ans=0.125 2024-08-10 10:23:50,824 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 10:24:13,242 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 10:24:13,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=506470.0, ans=0.125 2024-08-10 10:24:14,947 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 8 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 10:24:25,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=506570.0, ans=0.125 2024-08-10 10:24:30,189 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 10:24:36,239 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 10:24:42,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=506670.0, ans=0.0 2024-08-10 10:24:44,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506670.0, ans=0.1 2024-08-10 10:24:47,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=506670.0, ans=0.2 2024-08-10 10:24:48,525 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 10:24:54,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-10 10:24:55,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7200, loss[loss=0.09463, beats_loss=0.01249, ecapa_loss=0.0002438, whisper_loss=0.0797, over 17772.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01213, ecapa_loss=0.0002679, whisper_loss=0.09648, over 3884339.00 frames. ], batch size: 68, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:24:55,745 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 10:25:12,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=506870.0, ans=0.125 2024-08-10 10:25:19,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=506870.0, ans=0.125 2024-08-10 10:25:28,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=506970.0, ans=0.0 2024-08-10 10:25:28,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=506970.0, ans=0.125 2024-08-10 10:25:29,771 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 10:25:35,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.350e+01 3.179e+01 3.637e+01 4.087e+01 6.923e+01, threshold=7.273e+01, percent-clipped=0.0 2024-08-10 10:25:40,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=506970.0, ans=0.05 2024-08-10 10:25:43,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=506970.0, ans=0.125 2024-08-10 10:25:56,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=507070.0, ans=0.125 2024-08-10 10:25:56,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=507070.0, ans=0.0 2024-08-10 10:26:01,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=507170.0, ans=0.0 2024-08-10 10:26:03,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=507170.0, ans=0.2 2024-08-10 10:26:08,015 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 26 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-10 10:26:18,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7250, loss[loss=0.1084, beats_loss=0.01088, ecapa_loss=0.0002973, whisper_loss=0.09452, over 14066.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01212, ecapa_loss=0.0002676, whisper_loss=0.09705, over 3895116.44 frames. ], batch size: 57, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:26:46,817 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 10:26:52,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=507470.0, ans=0.125 2024-08-10 10:26:52,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=507470.0, ans=0.05 2024-08-10 10:27:04,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=507570.0, ans=0.125 2024-08-10 10:27:16,307 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 10:27:17,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.21 vs. limit=15.0 2024-08-10 10:27:28,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2024-08-10 10:27:29,754 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 10:27:33,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=15.0 2024-08-10 10:27:37,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7300, loss[loss=0.105, beats_loss=0.01181, ecapa_loss=0.0003193, whisper_loss=0.09003, over 20535.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01213, ecapa_loss=0.0002679, whisper_loss=0.09706, over 3907228.90 frames. ], batch size: 84, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:27:41,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507770.0, ans=0.1 2024-08-10 10:27:44,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507770.0, ans=0.1 2024-08-10 10:27:58,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=507870.0, ans=0.2 2024-08-10 10:28:00,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=507870.0, ans=0.0 2024-08-10 10:28:05,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=507870.0, ans=0.0 2024-08-10 10:28:16,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.332e+01 2.965e+01 3.375e+01 3.820e+01 5.473e+01, threshold=6.750e+01, percent-clipped=0.0 2024-08-10 10:28:25,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-08-10 10:28:29,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=508070.0, ans=10.0 2024-08-10 10:28:59,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7350, loss[loss=0.1066, beats_loss=0.01285, ecapa_loss=0.0002647, whisper_loss=0.09106, over 15442.00 frames. ], tot_loss[loss=0.1125, beats_loss=0.01214, ecapa_loss=0.0002663, whisper_loss=0.09765, over 3912198.77 frames. ], batch size: 63, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:29:05,401 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 10:29:11,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=15.0 2024-08-10 10:29:22,115 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 10:29:23,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=508370.0, ans=0.0 2024-08-10 10:29:25,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2024-08-10 10:29:34,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.77 vs. limit=22.5 2024-08-10 10:29:38,624 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 10:29:40,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508470.0, ans=0.1 2024-08-10 10:30:06,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2024-08-10 10:30:26,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7400, loss[loss=0.111, beats_loss=0.01226, ecapa_loss=0.0002668, whisper_loss=0.09612, over 14956.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01218, ecapa_loss=0.0002667, whisper_loss=0.09668, over 3937212.92 frames. ], batch size: 59, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:30:39,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=508770.0, ans=0.0 2024-08-10 10:30:40,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=508770.0, ans=0.125 2024-08-10 10:30:55,614 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 10:31:05,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.310e+01 2.905e+01 3.226e+01 3.755e+01 5.990e+01, threshold=6.451e+01, percent-clipped=0.0 2024-08-10 10:31:38,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509170.0, ans=0.1 2024-08-10 10:31:52,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7450, loss[loss=0.1092, beats_loss=0.01052, ecapa_loss=0.0002337, whisper_loss=0.09634, over 23375.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.0122, ecapa_loss=0.0002654, whisper_loss=0.09682, over 3917517.19 frames. ], batch size: 91, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:31:53,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=509270.0, ans=0.0 2024-08-10 10:32:06,756 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-10 10:32:06,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=509270.0, ans=0.125 2024-08-10 10:32:07,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=509270.0, ans=0.125 2024-08-10 10:32:09,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=509370.0, ans=0.0 2024-08-10 10:32:18,744 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 10:32:30,197 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 10:32:31,518 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 10:32:55,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=509570.0, ans=0.125 2024-08-10 10:33:06,769 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 10:33:09,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=509670.0, ans=0.125 2024-08-10 10:33:12,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=509670.0, ans=0.0 2024-08-10 10:33:17,137 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 10:33:18,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7500, loss[loss=0.115, beats_loss=0.01038, ecapa_loss=0.0002437, whisper_loss=0.1022, over 17865.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01221, ecapa_loss=0.0002659, whisper_loss=0.09665, over 3925221.32 frames. ], batch size: 69, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:33:56,971 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 29 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 10:33:58,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 3.154e+01 3.513e+01 4.160e+01 5.952e+01, threshold=7.025e+01, percent-clipped=0.0 2024-08-10 10:33:58,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-10 10:34:34,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=510170.0, ans=0.125 2024-08-10 10:34:37,322 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 15 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 10:34:38,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2024-08-10 10:34:39,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=510170.0, ans=0.125 2024-08-10 10:34:43,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7550, loss[loss=0.108, beats_loss=0.01445, ecapa_loss=0.0003079, whisper_loss=0.09048, over 22025.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01221, ecapa_loss=0.000263, whisper_loss=0.09705, over 3901924.40 frames. ], batch size: 93, lr: 1.50e-02, grad_scale: 1073741824.0 2024-08-10 10:34:53,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=510270.0, ans=0.0 2024-08-10 10:35:01,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=510370.0, ans=0.125 2024-08-10 10:35:02,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=510370.0, ans=0.2 2024-08-10 10:35:27,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2024-08-10 10:35:35,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2024-08-10 10:35:46,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=510570.0, ans=0.125 2024-08-10 10:36:02,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=15.0 2024-08-10 10:36:07,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7600, loss[loss=0.09343, beats_loss=0.01329, ecapa_loss=0.00023, whisper_loss=0.07784, over 16157.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01212, ecapa_loss=0.0002642, whisper_loss=0.09702, over 3871538.13 frames. ], batch size: 65, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:36:19,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=510770.0, ans=0.0 2024-08-10 10:36:28,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=510870.0, ans=0.125 2024-08-10 10:36:45,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.819e+01 3.165e+01 3.521e+01 5.971e+01, threshold=6.331e+01, percent-clipped=0.0 2024-08-10 10:36:52,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=510970.0, ans=0.2 2024-08-10 10:36:54,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=510970.0, ans=0.2 2024-08-10 10:37:05,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2024-08-10 10:37:18,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=511170.0, ans=0.125 2024-08-10 10:37:20,678 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 10:37:24,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=511170.0, ans=0.125 2024-08-10 10:37:34,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7650, loss[loss=0.1139, beats_loss=0.01021, ecapa_loss=0.0003235, whisper_loss=0.1005, over 21461.00 frames. ], tot_loss[loss=0.1122, beats_loss=0.01198, ecapa_loss=0.000266, whisper_loss=0.09759, over 3898985.84 frames. ], batch size: 87, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:37:34,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=511270.0, ans=0.125 2024-08-10 10:37:58,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=511370.0, ans=0.125 2024-08-10 10:38:06,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=511470.0, ans=0.125 2024-08-10 10:38:21,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2024-08-10 10:38:51,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=511670.0, ans=0.2 2024-08-10 10:38:51,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-10 10:38:53,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2024-08-10 10:38:53,835 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 10:38:58,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7700, loss[loss=0.1042, beats_loss=0.01136, ecapa_loss=0.0002428, whisper_loss=0.09037, over 21993.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01198, ecapa_loss=0.0002676, whisper_loss=0.09747, over 3873756.89 frames. ], batch size: 88, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:38:59,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=511770.0, ans=0.0 2024-08-10 10:39:11,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=511770.0, ans=0.125 2024-08-10 10:39:17,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=511870.0, ans=0.125 2024-08-10 10:39:39,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.498e+01 3.237e+01 3.581e+01 4.281e+01 8.585e+01, threshold=7.162e+01, percent-clipped=2.0 2024-08-10 10:39:47,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-10 10:39:47,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.81 vs. limit=15.0 2024-08-10 10:39:51,545 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 10:39:53,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=512070.0, ans=0.0 2024-08-10 10:39:57,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.77 vs. limit=22.5 2024-08-10 10:40:00,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=512070.0, ans=0.125 2024-08-10 10:40:04,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=512070.0, ans=0.0 2024-08-10 10:40:22,750 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7750, loss[loss=0.1223, beats_loss=0.009766, ecapa_loss=0.0002945, whisper_loss=0.1096, over 19356.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01201, ecapa_loss=0.0002679, whisper_loss=0.09655, over 3873522.68 frames. ], batch size: 78, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:40:24,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=512270.0, ans=0.0 2024-08-10 10:40:33,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.26 vs. limit=12.0 2024-08-10 10:40:52,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=512370.0, ans=0.0 2024-08-10 10:41:03,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=512470.0, ans=0.2 2024-08-10 10:41:06,815 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 10:41:07,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.58 vs. limit=5.0 2024-08-10 10:41:20,233 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 10:41:30,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=512670.0, ans=0.125 2024-08-10 10:41:45,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7800, loss[loss=0.09793, beats_loss=0.01369, ecapa_loss=0.0002409, whisper_loss=0.08183, over 22873.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01204, ecapa_loss=0.0002694, whisper_loss=0.09684, over 3904054.09 frames. ], batch size: 92, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:41:49,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=512770.0, ans=0.2 2024-08-10 10:41:51,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-08-10 10:41:52,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=512770.0, ans=0.0 2024-08-10 10:41:54,030 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 10:41:57,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=512770.0, ans=0.125 2024-08-10 10:42:02,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512870.0, ans=0.0 2024-08-10 10:42:23,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+01 3.017e+01 3.377e+01 3.890e+01 5.572e+01, threshold=6.753e+01, percent-clipped=0.0 2024-08-10 10:42:33,934 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 10:43:03,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7850, loss[loss=0.1032, beats_loss=0.009377, ecapa_loss=0.0002723, whisper_loss=0.09114, over 16127.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01211, ecapa_loss=0.0002686, whisper_loss=0.09689, over 3916207.63 frames. ], batch size: 62, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:43:06,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=513270.0, ans=0.05 2024-08-10 10:43:27,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-08-10 10:43:46,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=513470.0, ans=0.2 2024-08-10 10:43:58,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-10 10:44:10,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=513670.0, ans=0.125 2024-08-10 10:44:21,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513670.0, ans=0.1 2024-08-10 10:44:27,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7900, loss[loss=0.08837, beats_loss=0.01268, ecapa_loss=0.0002794, whisper_loss=0.0729, over 14744.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01211, ecapa_loss=0.0002682, whisper_loss=0.09686, over 3912816.31 frames. ], batch size: 58, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:44:34,482 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 10:44:39,726 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 10:44:49,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=513870.0, ans=0.125 2024-08-10 10:45:05,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+01 2.988e+01 3.259e+01 3.767e+01 5.929e+01, threshold=6.519e+01, percent-clipped=0.0 2024-08-10 10:45:27,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=514070.0, ans=0.2 2024-08-10 10:45:27,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=514070.0, ans=0.0 2024-08-10 10:45:32,629 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 10:45:38,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=514170.0, ans=0.125 2024-08-10 10:45:38,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-10 10:45:41,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=514170.0, ans=0.125 2024-08-10 10:45:43,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-10 10:45:49,100 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 10:45:50,659 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 7950, loss[loss=0.1258, beats_loss=0.01123, ecapa_loss=0.0002641, whisper_loss=0.112, over 22216.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01208, ecapa_loss=0.000267, whisper_loss=0.09652, over 3895883.78 frames. ], batch size: 89, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:45:51,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2024-08-10 10:45:52,180 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 10:45:58,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=514270.0, ans=0.125 2024-08-10 10:46:08,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=514370.0, ans=0.1 2024-08-10 10:46:22,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-10 10:46:27,438 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.735e-01 2024-08-10 10:46:33,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=514470.0, ans=0.07 2024-08-10 10:46:49,796 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 10:46:53,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2024-08-10 10:46:54,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=514670.0, ans=0.125 2024-08-10 10:47:02,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=514670.0, ans=0.125 2024-08-10 10:47:12,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8000, loss[loss=0.09473, beats_loss=0.01528, ecapa_loss=0.0002309, whisper_loss=0.07714, over 15797.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01206, ecapa_loss=0.0002655, whisper_loss=0.09653, over 3891945.85 frames. ], batch size: 64, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:47:17,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=514770.0, ans=0.125 2024-08-10 10:47:24,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=514770.0, ans=0.125 2024-08-10 10:47:25,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=514770.0, ans=0.95 2024-08-10 10:47:28,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=514870.0, ans=0.2 2024-08-10 10:47:39,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=514870.0, ans=0.0 2024-08-10 10:47:43,355 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 10:47:49,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=514970.0, ans=0.125 2024-08-10 10:47:52,756 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 2.849e+01 3.134e+01 3.665e+01 7.663e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 10:48:02,818 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 10:48:18,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=515070.0, ans=0.0 2024-08-10 10:48:22,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-08-10 10:48:23,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=515170.0, ans=0.125 2024-08-10 10:48:28,292 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 10:48:40,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8050, loss[loss=0.1087, beats_loss=0.012, ecapa_loss=0.0002643, whisper_loss=0.09411, over 21875.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01203, ecapa_loss=0.0002674, whisper_loss=0.09635, over 3890998.96 frames. ], batch size: 89, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:49:23,877 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 10:49:44,376 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 28 from Vox, 18 fro AS 2024-08-10 10:49:44,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=515370.0, ans=0.0 2024-08-10 10:49:58,812 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 35 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 10:50:06,811 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 10:50:12,660 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 10:50:22,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=515570.0, ans=0.2 2024-08-10 10:50:22,915 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 10:50:24,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=515670.0, ans=10.0 2024-08-10 10:50:33,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-08-10 10:50:41,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8100, loss[loss=0.1234, beats_loss=0.01109, ecapa_loss=0.0002391, whisper_loss=0.1099, over 23224.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.0119, ecapa_loss=0.0002688, whisper_loss=0.09723, over 3905425.80 frames. ], batch size: 92, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:50:45,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=515770.0, ans=0.125 2024-08-10 10:51:02,055 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 34 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 10:51:07,564 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 37 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 10:51:20,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.323e+01 3.111e+01 3.674e+01 4.170e+01 5.858e+01, threshold=7.349e+01, percent-clipped=0.0 2024-08-10 10:51:25,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=515970.0, ans=0.0 2024-08-10 10:51:25,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=515970.0, ans=15.0 2024-08-10 10:51:26,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=515970.0, ans=0.125 2024-08-10 10:51:31,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=516070.0, ans=0.125 2024-08-10 10:51:44,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=516070.0, ans=0.0 2024-08-10 10:51:53,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=516170.0, ans=0.035 2024-08-10 10:52:03,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8150, loss[loss=0.12, beats_loss=0.01095, ecapa_loss=0.0002446, whisper_loss=0.1066, over 22654.00 frames. ], tot_loss[loss=0.1123, beats_loss=0.0118, ecapa_loss=0.0002691, whisper_loss=0.09776, over 3892148.26 frames. ], batch size: 88, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:52:10,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=516270.0, ans=0.07 2024-08-10 10:52:15,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=516270.0, ans=0.0 2024-08-10 10:52:34,104 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 10:52:50,499 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 10:52:53,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=516570.0, ans=0.0 2024-08-10 10:52:59,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-10 10:53:15,927 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 10:53:23,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8200, loss[loss=0.09904, beats_loss=0.01362, ecapa_loss=0.0002351, whisper_loss=0.08307, over 18160.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.0119, ecapa_loss=0.000269, whisper_loss=0.09724, over 3891138.97 frames. ], batch size: 72, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:53:33,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=516770.0, ans=0.125 2024-08-10 10:53:34,620 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 10:53:39,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=516870.0, ans=0.05 2024-08-10 10:53:42,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=516870.0, ans=0.125 2024-08-10 10:53:51,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=516870.0, ans=0.125 2024-08-10 10:54:00,078 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.913e+01 3.375e+01 3.842e+01 5.271e+01, threshold=6.749e+01, percent-clipped=0.0 2024-08-10 10:54:03,842 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 10:54:07,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-10 10:54:13,020 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 10:54:18,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.32 vs. limit=10.0 2024-08-10 10:54:20,377 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 10:54:34,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-10 10:54:42,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8250, loss[loss=0.08651, beats_loss=0.01456, ecapa_loss=0.0002366, whisper_loss=0.06958, over 22379.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01204, ecapa_loss=0.0002686, whisper_loss=0.09622, over 3923150.23 frames. ], batch size: 92, lr: 1.49e-02, grad_scale: 1073741824.0 2024-08-10 10:54:44,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=517270.0, ans=0.0 2024-08-10 10:54:47,554 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 10:54:48,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2024-08-10 10:55:09,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2024-08-10 10:55:11,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=517370.0, ans=0.125 2024-08-10 10:55:17,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=517470.0, ans=0.0 2024-08-10 10:55:31,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517570.0, ans=0.125 2024-08-10 10:55:36,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=517570.0, ans=0.125 2024-08-10 10:55:50,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517670.0, ans=0.1 2024-08-10 10:56:00,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8300, loss[loss=0.1229, beats_loss=0.01236, ecapa_loss=0.000272, whisper_loss=0.1079, over 20233.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01204, ecapa_loss=0.0002701, whisper_loss=0.09602, over 3902948.48 frames. ], batch size: 81, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:56:05,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=517770.0, ans=0.0 2024-08-10 10:56:09,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-08-10 10:56:36,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.966e+01 3.242e+01 3.921e+01 6.642e+01, threshold=6.483e+01, percent-clipped=0.0 2024-08-10 10:56:58,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=518070.0, ans=0.125 2024-08-10 10:56:59,587 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 10:57:06,556 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 10:57:24,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8350, loss[loss=0.1192, beats_loss=0.01207, ecapa_loss=0.0002563, whisper_loss=0.1046, over 19478.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01214, ecapa_loss=0.0002683, whisper_loss=0.09559, over 3896658.88 frames. ], batch size: 76, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:57:44,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=518370.0, ans=0.125 2024-08-10 10:58:19,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518570.0, ans=0.1 2024-08-10 10:58:30,549 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 10:58:49,659 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 25 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-10 10:58:57,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=518770.0, ans=0.2 2024-08-10 10:58:59,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8400, loss[loss=0.0901, beats_loss=0.01438, ecapa_loss=0.0002231, whisper_loss=0.07349, over 22087.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01212, ecapa_loss=0.0002702, whisper_loss=0.09639, over 3895493.25 frames. ], batch size: 88, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 10:59:01,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518770.0, ans=0.1 2024-08-10 10:59:05,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=518770.0, ans=0.0 2024-08-10 10:59:05,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=518770.0, ans=0.125 2024-08-10 10:59:07,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=518770.0, ans=0.2 2024-08-10 10:59:14,763 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 10:59:15,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=518770.0, ans=0.125 2024-08-10 10:59:25,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=518870.0, ans=0.2 2024-08-10 10:59:32,501 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 10:59:40,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 3.091e+01 3.394e+01 4.166e+01 8.578e+01, threshold=6.788e+01, percent-clipped=4.0 2024-08-10 10:59:45,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-10 10:59:48,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-10 10:59:57,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.96 vs. limit=10.0 2024-08-10 11:00:01,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=519070.0, ans=0.125 2024-08-10 11:00:12,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=519170.0, ans=0.0 2024-08-10 11:00:25,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=519270.0, ans=0.0 2024-08-10 11:00:27,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8450, loss[loss=0.09718, beats_loss=0.01386, ecapa_loss=0.0002355, whisper_loss=0.08097, over 22284.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.0121, ecapa_loss=0.0002705, whisper_loss=0.09638, over 3893310.22 frames. ], batch size: 91, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:00:36,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=519270.0, ans=15.0 2024-08-10 11:01:29,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=519570.0, ans=0.125 2024-08-10 11:01:36,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=519670.0, ans=0.125 2024-08-10 11:01:41,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=519670.0, ans=0.2 2024-08-10 11:01:48,421 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 11:01:54,787 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8500, loss[loss=0.1171, beats_loss=0.01165, ecapa_loss=0.0002523, whisper_loss=0.1029, over 16504.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01203, ecapa_loss=0.0002702, whisper_loss=0.09637, over 3857692.93 frames. ], batch size: 63, lr: 1.48e-02, grad_scale: 1073741824.0 2024-08-10 11:01:59,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=519770.0, ans=0.125 2024-08-10 11:02:05,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=519770.0, ans=0.0 2024-08-10 11:02:08,019 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-10 11:02:10,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=519770.0, ans=0.125 2024-08-10 11:02:30,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=519970.0, ans=0.0 2024-08-10 11:02:32,536 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-10 11:02:36,000 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-52000.pt 2024-08-10 11:02:40,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+01 3.171e+01 3.733e+01 4.165e+01 6.058e+01, threshold=7.466e+01, percent-clipped=0.0 2024-08-10 11:02:44,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=519970.0, ans=0.125 2024-08-10 11:02:51,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=520070.0, ans=0.125 2024-08-10 11:03:02,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=520070.0, ans=0.125 2024-08-10 11:03:08,991 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 11:03:15,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=520170.0, ans=0.0 2024-08-10 11:03:17,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=520170.0, ans=0.0 2024-08-10 11:03:20,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=520170.0, ans=0.0 2024-08-10 11:03:26,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8550, loss[loss=0.1273, beats_loss=0.0127, ecapa_loss=0.0002203, whisper_loss=0.1124, over 23489.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01201, ecapa_loss=0.0002684, whisper_loss=0.0964, over 3866660.91 frames. ], batch size: 93, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:03:46,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=520370.0, ans=10.0 2024-08-10 11:04:02,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=520470.0, ans=0.125 2024-08-10 11:04:16,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=520470.0, ans=0.07 2024-08-10 11:04:27,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520570.0, ans=0.1 2024-08-10 11:04:42,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=520670.0, ans=0.2 2024-08-10 11:04:57,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8600, loss[loss=0.1003, beats_loss=0.01474, ecapa_loss=0.0001934, whisper_loss=0.08363, over 18707.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01204, ecapa_loss=0.0002664, whisper_loss=0.09626, over 3869565.69 frames. ], batch size: 73, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:05:29,980 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 11:05:36,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+01 3.011e+01 3.429e+01 3.879e+01 6.555e+01, threshold=6.857e+01, percent-clipped=0.0 2024-08-10 11:05:40,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-10 11:05:42,454 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 11:06:07,213 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 11:06:09,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=521170.0, ans=0.2 2024-08-10 11:06:11,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=521170.0, ans=0.125 2024-08-10 11:06:17,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=521170.0, ans=0.2 2024-08-10 11:06:17,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=521170.0, ans=0.125 2024-08-10 11:06:20,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=521170.0, ans=0.0 2024-08-10 11:06:27,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8650, loss[loss=0.1038, beats_loss=0.01377, ecapa_loss=0.0002261, whisper_loss=0.0878, over 18895.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01209, ecapa_loss=0.0002653, whisper_loss=0.09587, over 3859520.07 frames. ], batch size: 75, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:06:50,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=521370.0, ans=0.0 2024-08-10 11:07:00,597 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 11:07:11,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2024-08-10 11:07:18,991 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 11:07:28,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=521570.0, ans=15.0 2024-08-10 11:07:31,157 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 11:07:38,200 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 11:07:57,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8700, loss[loss=0.09662, beats_loss=0.01413, ecapa_loss=0.0002241, whisper_loss=0.08026, over 17072.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01212, ecapa_loss=0.0002656, whisper_loss=0.09525, over 3848317.54 frames. ], batch size: 69, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:07:57,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=521770.0, ans=0.0 2024-08-10 11:08:10,549 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 11:08:37,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.301e+01 2.909e+01 3.289e+01 3.792e+01 9.063e+01, threshold=6.579e+01, percent-clipped=1.0 2024-08-10 11:08:42,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.20 vs. limit=22.5 2024-08-10 11:09:13,867 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 11:09:24,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=522270.0, ans=0.125 2024-08-10 11:09:25,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8750, loss[loss=0.1235, beats_loss=0.01075, ecapa_loss=0.0002881, whisper_loss=0.1098, over 23540.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01211, ecapa_loss=0.0002673, whisper_loss=0.09502, over 3846003.77 frames. ], batch size: 95, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:09:31,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=522270.0, ans=0.125 2024-08-10 11:09:49,696 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 11:09:53,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=522370.0, ans=0.2 2024-08-10 11:09:53,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=522370.0, ans=0.2 2024-08-10 11:09:58,377 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 11:10:06,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=522470.0, ans=0.2 2024-08-10 11:10:09,963 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 11:10:21,472 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 11:10:29,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2024-08-10 11:10:34,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=522670.0, ans=0.0 2024-08-10 11:10:37,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=522670.0, ans=0.125 2024-08-10 11:10:44,962 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 11:10:47,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-08-10 11:10:52,246 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8800, loss[loss=0.1094, beats_loss=0.01223, ecapa_loss=0.0002414, whisper_loss=0.09475, over 23251.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01216, ecapa_loss=0.0002654, whisper_loss=0.09538, over 3884741.86 frames. ], batch size: 93, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:11:18,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=522870.0, ans=0.125 2024-08-10 11:11:32,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 3.151e+01 3.444e+01 3.946e+01 7.427e+01, threshold=6.887e+01, percent-clipped=2.0 2024-08-10 11:11:34,833 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 11:12:05,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.73 vs. limit=22.5 2024-08-10 11:12:06,835 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 11:12:17,922 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 11:12:21,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8850, loss[loss=0.1214, beats_loss=0.01163, ecapa_loss=0.0002848, whisper_loss=0.1069, over 20093.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.0121, ecapa_loss=0.0002646, whisper_loss=0.09642, over 3854570.43 frames. ], batch size: 81, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:12:25,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=523270.0, ans=0.07 2024-08-10 11:12:37,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=523370.0, ans=0.2 2024-08-10 11:13:09,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=523470.0, ans=0.125 2024-08-10 11:13:09,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2024-08-10 11:13:19,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=523570.0, ans=0.125 2024-08-10 11:13:26,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=523570.0, ans=0.1 2024-08-10 11:13:45,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523670.0, ans=0.1 2024-08-10 11:13:48,942 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 11:13:51,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8900, loss[loss=0.1235, beats_loss=0.01146, ecapa_loss=0.0002745, whisper_loss=0.1093, over 22934.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01206, ecapa_loss=0.0002626, whisper_loss=0.09691, over 3823224.63 frames. ], batch size: 91, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:13:59,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=523770.0, ans=0.125 2024-08-10 11:14:18,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=523870.0, ans=0.0 2024-08-10 11:14:22,295 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 11:14:35,418 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.997e+01 3.258e+01 3.778e+01 5.539e+01, threshold=6.517e+01, percent-clipped=0.0 2024-08-10 11:14:39,390 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 11:14:43,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=523970.0, ans=0.125 2024-08-10 11:15:22,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 8950, loss[loss=0.1004, beats_loss=0.01524, ecapa_loss=0.0001933, whisper_loss=0.08323, over 14955.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01203, ecapa_loss=0.0002629, whisper_loss=0.09728, over 3822830.72 frames. ], batch size: 57, lr: 1.48e-02, grad_scale: 2147483648.0 2024-08-10 11:15:24,457 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 11:15:28,164 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 11:15:43,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=15.0 2024-08-10 11:16:15,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=524570.0, ans=0.0 2024-08-10 11:16:26,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524570.0, ans=0.1 2024-08-10 11:16:30,534 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 11:16:39,357 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 11:16:49,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9000, loss[loss=0.1153, beats_loss=0.01059, ecapa_loss=0.0003107, whisper_loss=0.1016, over 16960.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.0121, ecapa_loss=0.000264, whisper_loss=0.09678, over 3859712.12 frames. ], batch size: 67, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:16:49,802 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 11:17:36,020 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on ASR_libri: loss=0.2658, beats_loss=0, ecapa_loss=0.000793, whisper_loss=0.2579, over 922467.00 frames. 2024-08-10 11:17:43,372 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0932, 3.8533, 3.7516, 3.3783], device='cuda:0') 2024-08-10 11:17:54,650 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on SV_voxceleb1: loss=0.007025, beats_loss=0, ecapa_loss=0.0007025, whisper_loss=0, over 939242.00 frames. 2024-08-10 11:19:54,445 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on AT_audioset: loss=0.02753, beats_loss=0.02753, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 11:19:54,449 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 11:20:00,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=524770.0, ans=0.0 2024-08-10 11:20:02,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=524770.0, ans=0.035 2024-08-10 11:20:14,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=524870.0, ans=0.2 2024-08-10 11:20:22,278 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 36 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 11:20:33,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+01 3.014e+01 3.320e+01 3.675e+01 5.799e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:21:05,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=525170.0, ans=0.2 2024-08-10 11:21:18,152 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 11:21:19,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9050, loss[loss=0.1073, beats_loss=0.01205, ecapa_loss=0.0002603, whisper_loss=0.09269, over 23041.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01203, ecapa_loss=0.0002628, whisper_loss=0.09714, over 3851880.50 frames. ], batch size: 93, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:21:22,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-08-10 11:21:44,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=525370.0, ans=0.2 2024-08-10 11:21:45,864 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 11:21:47,989 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 11:22:11,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=525570.0, ans=0.0 2024-08-10 11:22:14,480 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.716e-03 2024-08-10 11:22:27,024 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 11:22:27,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=525670.0, ans=0.035 2024-08-10 11:22:43,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9100, loss[loss=0.1232, beats_loss=0.009106, ecapa_loss=0.0003225, whisper_loss=0.1109, over 22016.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01199, ecapa_loss=0.000264, whisper_loss=0.0968, over 3855621.51 frames. ], batch size: 91, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:22:59,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-10 11:23:09,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=525870.0, ans=0.125 2024-08-10 11:23:20,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 2.970e+01 3.325e+01 3.905e+01 6.354e+01, threshold=6.649e+01, percent-clipped=0.0 2024-08-10 11:23:50,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=526170.0, ans=0.125 2024-08-10 11:23:51,318 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 11:24:03,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9150, loss[loss=0.09996, beats_loss=0.01536, ecapa_loss=0.0002163, whisper_loss=0.08244, over 18242.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01202, ecapa_loss=0.0002617, whisper_loss=0.09686, over 3841966.08 frames. ], batch size: 72, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:24:05,410 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 11:24:11,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2024-08-10 11:24:14,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=526270.0, ans=0.125 2024-08-10 11:24:23,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=526370.0, ans=0.0 2024-08-10 11:24:26,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-10 11:24:30,855 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 11:24:31,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-10 11:24:39,994 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 11:24:40,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=526470.0, ans=0.0 2024-08-10 11:24:57,690 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 11:25:11,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=526670.0, ans=0.2 2024-08-10 11:25:18,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9200, loss[loss=0.1336, beats_loss=0.009093, ecapa_loss=0.0002224, whisper_loss=0.1223, over 18541.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01196, ecapa_loss=0.0002628, whisper_loss=0.09714, over 3851291.40 frames. ], batch size: 69, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:25:42,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=526870.0, ans=0.125 2024-08-10 11:25:44,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=526970.0, ans=0.2 2024-08-10 11:25:48,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=22.5 2024-08-10 11:25:49,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.987e+01 3.332e+01 3.744e+01 5.839e+01, threshold=6.663e+01, percent-clipped=0.0 2024-08-10 11:25:52,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2024-08-10 11:25:55,896 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 11:25:58,404 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 11:26:11,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.71 vs. limit=22.5 2024-08-10 11:26:16,362 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-10 11:26:18,825 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 11:26:24,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9250, loss[loss=0.08717, beats_loss=0.01133, ecapa_loss=0.0002715, whisper_loss=0.07312, over 17172.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01199, ecapa_loss=0.0002632, whisper_loss=0.09674, over 3858806.80 frames. ], batch size: 69, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:26:25,143 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-10 11:26:33,940 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 11:26:38,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=527370.0, ans=0.2 2024-08-10 11:26:44,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=527370.0, ans=0.125 2024-08-10 11:26:48,549 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 11:26:48,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=527370.0, ans=0.125 2024-08-10 11:26:53,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=527470.0, ans=0.2 2024-08-10 11:26:53,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=527470.0, ans=0.0 2024-08-10 11:26:55,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=527470.0, ans=0.125 2024-08-10 11:27:03,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2024-08-10 11:27:09,731 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 11:27:17,502 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 11:27:21,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527670.0, ans=0.1 2024-08-10 11:27:25,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=527670.0, ans=0.2 2024-08-10 11:27:28,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=527670.0, ans=0.125 2024-08-10 11:27:30,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9300, loss[loss=0.1053, beats_loss=0.01461, ecapa_loss=0.0002479, whisper_loss=0.08818, over 21450.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01205, ecapa_loss=0.0002616, whisper_loss=0.09566, over 3846285.11 frames. ], batch size: 89, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:27:40,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=527770.0, ans=0.125 2024-08-10 11:27:46,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=527870.0, ans=0.125 2024-08-10 11:27:50,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=527870.0, ans=0.125 2024-08-10 11:27:55,667 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 11:28:02,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.982e+01 3.468e+01 4.140e+01 6.249e+01, threshold=6.936e+01, percent-clipped=0.0 2024-08-10 11:28:14,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=528070.0, ans=0.1 2024-08-10 11:28:19,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=528070.0, ans=0.125 2024-08-10 11:28:25,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-10 11:28:27,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=22.5 2024-08-10 11:28:32,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=528170.0, ans=0.125 2024-08-10 11:28:34,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=528170.0, ans=0.125 2024-08-10 11:28:38,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=528170.0, ans=0.125 2024-08-10 11:28:39,830 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 11:28:41,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9350, loss[loss=0.126, beats_loss=0.01224, ecapa_loss=0.0002549, whisper_loss=0.1112, over 18247.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01198, ecapa_loss=0.0002615, whisper_loss=0.09616, over 3846921.68 frames. ], batch size: 73, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:28:42,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=528270.0, ans=0.125 2024-08-10 11:29:14,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-10 11:29:19,484 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 11:29:21,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=528470.0, ans=0.125 2024-08-10 11:29:37,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=528570.0, ans=0.125 2024-08-10 11:29:50,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=528670.0, ans=0.125 2024-08-10 11:29:53,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9400, loss[loss=0.1163, beats_loss=0.01203, ecapa_loss=0.0002698, whisper_loss=0.1016, over 23126.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.012, ecapa_loss=0.000262, whisper_loss=0.09612, over 3876837.88 frames. ], batch size: 95, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:30:05,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=528770.0, ans=0.95 2024-08-10 11:30:13,161 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-10 11:30:18,879 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 11:30:31,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.405e+01 3.113e+01 3.432e+01 4.042e+01 8.997e+01, threshold=6.863e+01, percent-clipped=2.0 2024-08-10 11:30:45,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=529070.0, ans=0.5 2024-08-10 11:30:46,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=529070.0, ans=0.125 2024-08-10 11:30:51,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=529070.0, ans=0.125 2024-08-10 11:31:01,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=529170.0, ans=0.125 2024-08-10 11:31:06,922 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 11:31:10,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9450, loss[loss=0.127, beats_loss=0.01279, ecapa_loss=0.0002424, whisper_loss=0.1118, over 23217.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.012, ecapa_loss=0.0002627, whisper_loss=0.09588, over 3851485.90 frames. ], batch size: 92, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:31:18,321 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 11:31:51,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=529470.0, ans=0.125 2024-08-10 11:31:55,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=529570.0, ans=0.0 2024-08-10 11:31:59,683 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-10 11:32:01,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-10 11:32:09,894 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.044e+01 2024-08-10 11:32:27,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9500, loss[loss=0.1143, beats_loss=0.01355, ecapa_loss=0.0002319, whisper_loss=0.09844, over 14549.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01202, ecapa_loss=0.0002646, whisper_loss=0.09559, over 3868094.04 frames. ], batch size: 57, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:32:30,683 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 11:32:33,794 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 11:32:38,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=529770.0, ans=0.0 2024-08-10 11:32:39,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=529770.0, ans=0.125 2024-08-10 11:32:53,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=529870.0, ans=0.125 2024-08-10 11:32:53,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=529870.0, ans=0.0 2024-08-10 11:32:54,753 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 11:33:00,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+01 2.898e+01 3.217e+01 3.700e+01 5.976e+01, threshold=6.434e+01, percent-clipped=0.0 2024-08-10 11:33:02,462 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 11:33:07,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=529970.0, ans=0.0 2024-08-10 11:33:08,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=529970.0, ans=0.0 2024-08-10 11:33:11,608 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:33:13,848 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 11:33:18,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=530070.0, ans=0.2 2024-08-10 11:33:21,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=530070.0, ans=0.125 2024-08-10 11:33:27,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=530170.0, ans=0.125 2024-08-10 11:33:29,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-10 11:33:38,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9550, loss[loss=0.1087, beats_loss=0.009029, ecapa_loss=0.0002962, whisper_loss=0.09675, over 16861.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.012, ecapa_loss=0.0002643, whisper_loss=0.09577, over 3839550.89 frames. ], batch size: 72, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:33:42,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=530270.0, ans=0.025 2024-08-10 11:33:50,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=530370.0, ans=0.125 2024-08-10 11:33:53,258 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 11:34:15,827 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 11:34:27,006 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 11:34:33,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=530670.0, ans=0.125 2024-08-10 11:34:45,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9600, loss[loss=0.1134, beats_loss=0.01337, ecapa_loss=0.000263, whisper_loss=0.09745, over 22544.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.012, ecapa_loss=0.0002626, whisper_loss=0.09617, over 3811204.09 frames. ], batch size: 92, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:34:59,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=530870.0, ans=0.2 2024-08-10 11:35:16,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.859e+01 3.374e+01 4.050e+01 6.854e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 11:35:18,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=530970.0, ans=0.125 2024-08-10 11:35:21,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.62 vs. limit=22.5 2024-08-10 11:35:24,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=531070.0, ans=0.125 2024-08-10 11:35:25,751 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-10 11:35:42,849 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 11:35:44,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-10 11:35:50,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=531270.0, ans=0.125 2024-08-10 11:35:51,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9650, loss[loss=0.07025, beats_loss=0.01228, ecapa_loss=0.0002351, whisper_loss=0.05562, over 14570.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01204, ecapa_loss=0.0002613, whisper_loss=0.09602, over 3830447.85 frames. ], batch size: 58, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:35:54,220 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 11:35:55,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=531270.0, ans=0.125 2024-08-10 11:36:14,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=531370.0, ans=0.125 2024-08-10 11:36:29,414 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 11:36:32,999 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 11:36:33,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-10 11:36:40,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=531570.0, ans=0.125 2024-08-10 11:36:55,005 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9700, loss[loss=0.1409, beats_loss=0.0095, ecapa_loss=0.0003034, whisper_loss=0.1284, over 20385.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01205, ecapa_loss=0.0002625, whisper_loss=0.09574, over 3813006.53 frames. ], batch size: 82, lr: 1.47e-02, grad_scale: 2147483648.0 2024-08-10 11:36:56,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=531770.0, ans=10.0 2024-08-10 11:37:17,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=531870.0, ans=0.0 2024-08-10 11:37:17,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2024-08-10 11:37:18,410 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 11:37:24,751 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.979e+01 3.402e+01 3.794e+01 6.549e+01, threshold=6.804e+01, percent-clipped=0.0 2024-08-10 11:37:32,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-08-10 11:37:32,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2024-08-10 11:37:34,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532070.0, ans=0.1 2024-08-10 11:37:36,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=532070.0, ans=0.125 2024-08-10 11:37:45,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=532170.0, ans=0.025 2024-08-10 11:38:00,165 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9750, loss[loss=0.09851, beats_loss=0.01378, ecapa_loss=0.0002779, whisper_loss=0.08195, over 18465.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01214, ecapa_loss=0.0002625, whisper_loss=0.09497, over 3821785.47 frames. ], batch size: 78, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:38:03,064 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 11:38:07,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=532270.0, ans=0.125 2024-08-10 11:38:10,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.32 vs. limit=15.0 2024-08-10 11:38:12,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=532370.0, ans=0.125 2024-08-10 11:38:28,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=532470.0, ans=0.0 2024-08-10 11:38:32,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2024-08-10 11:38:54,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=532670.0, ans=0.125 2024-08-10 11:39:03,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2024-08-10 11:39:06,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9800, loss[loss=0.122, beats_loss=0.01108, ecapa_loss=0.0002921, whisper_loss=0.1079, over 13585.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01214, ecapa_loss=0.0002623, whisper_loss=0.09512, over 3806310.95 frames. ], batch size: 55, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:39:27,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=532870.0, ans=0.125 2024-08-10 11:39:36,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.402e+01 3.053e+01 3.396e+01 3.815e+01 6.772e+01, threshold=6.792e+01, percent-clipped=0.0 2024-08-10 11:39:44,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-08-10 11:39:54,757 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.187e+05 2024-08-10 11:40:08,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=533170.0, ans=0.125 2024-08-10 11:40:10,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2024-08-10 11:40:11,249 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9850, loss[loss=0.08849, beats_loss=0.01388, ecapa_loss=0.0002898, whisper_loss=0.07172, over 16410.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01214, ecapa_loss=0.0002637, whisper_loss=0.09562, over 3866201.86 frames. ], batch size: 67, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:40:26,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=533370.0, ans=0.125 2024-08-10 11:40:29,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2024-08-10 11:40:37,216 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 11:40:41,143 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 37 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 11:40:41,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=533470.0, ans=0.125 2024-08-10 11:40:41,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=533470.0, ans=0.0 2024-08-10 11:40:42,321 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 11:40:47,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=533470.0, ans=0.025 2024-08-10 11:40:57,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=533570.0, ans=0.0 2024-08-10 11:41:05,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=533670.0, ans=0.125 2024-08-10 11:41:10,223 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-10 11:41:14,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-10 11:41:15,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9900, loss[loss=0.0988, beats_loss=0.01294, ecapa_loss=0.0002903, whisper_loss=0.08296, over 20528.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01221, ecapa_loss=0.0002619, whisper_loss=0.09547, over 3873572.99 frames. ], batch size: 83, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:41:18,498 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 11:41:22,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=533770.0, ans=0.125 2024-08-10 11:41:31,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=533870.0, ans=0.125 2024-08-10 11:41:36,966 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-10 11:41:41,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=533970.0, ans=0.025 2024-08-10 11:41:45,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.261e+01 2.867e+01 3.339e+01 3.780e+01 5.864e+01, threshold=6.678e+01, percent-clipped=0.0 2024-08-10 11:41:46,018 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 11:41:49,940 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 11:41:51,130 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 11:41:51,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533970.0, ans=0.125 2024-08-10 11:42:00,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=534070.0, ans=0.025 2024-08-10 11:42:06,674 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 11:42:14,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=534170.0, ans=0.125 2024-08-10 11:42:16,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-10 11:42:20,688 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 9950, loss[loss=0.1125, beats_loss=0.01331, ecapa_loss=0.0001671, whisper_loss=0.09747, over 20640.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01225, ecapa_loss=0.0002615, whisper_loss=0.0952, over 3904883.55 frames. ], batch size: 75, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:42:22,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2024-08-10 11:42:24,426 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 11:43:25,443 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10000, loss[loss=0.1149, beats_loss=0.01069, ecapa_loss=0.0002722, whisper_loss=0.1015, over 22013.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01219, ecapa_loss=0.000261, whisper_loss=0.09524, over 3861528.15 frames. ], batch size: 88, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:43:46,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=534870.0, ans=0.0 2024-08-10 11:43:46,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=534870.0, ans=0.0 2024-08-10 11:43:50,164 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-10 11:43:52,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=15.0 2024-08-10 11:43:55,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.932e+01 3.270e+01 3.845e+01 5.958e+01, threshold=6.541e+01, percent-clipped=0.0 2024-08-10 11:43:59,748 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 11:44:01,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-10 11:44:11,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=535070.0, ans=0.125 2024-08-10 11:44:30,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10050, loss[loss=0.1045, beats_loss=0.01281, ecapa_loss=0.0002047, whisper_loss=0.08965, over 16426.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0122, ecapa_loss=0.0002592, whisper_loss=0.09511, over 3867281.13 frames. ], batch size: 62, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:44:35,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2024-08-10 11:44:56,789 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 11:45:03,610 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 11:45:29,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=535670.0, ans=0.125 2024-08-10 11:45:35,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10100, loss[loss=0.1103, beats_loss=0.01139, ecapa_loss=0.0002687, whisper_loss=0.09625, over 22705.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01219, ecapa_loss=0.0002573, whisper_loss=0.0956, over 3878498.82 frames. ], batch size: 92, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:45:36,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-08-10 11:45:46,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-08-10 11:45:53,129 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 11:46:05,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.225e+01 3.067e+01 3.527e+01 4.291e+01 1.159e+02, threshold=7.053e+01, percent-clipped=2.0 2024-08-10 11:46:32,991 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 11:46:40,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10150, loss[loss=0.1086, beats_loss=0.01142, ecapa_loss=0.000267, whisper_loss=0.09446, over 20698.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.0121, ecapa_loss=0.0002595, whisper_loss=0.09624, over 3903186.21 frames. ], batch size: 86, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:46:41,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2024-08-10 11:46:42,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.30 vs. limit=22.5 2024-08-10 11:46:42,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-08-10 11:46:44,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2024-08-10 11:46:46,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536270.0, ans=0.0 2024-08-10 11:46:53,059 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 11:47:04,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=536370.0, ans=0.05 2024-08-10 11:47:35,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=536570.0, ans=0.1 2024-08-10 11:47:43,700 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.361e+00 2024-08-10 11:47:51,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=536670.0, ans=0.2 2024-08-10 11:47:57,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10200, loss[loss=0.1097, beats_loss=0.01119, ecapa_loss=0.000267, whisper_loss=0.09583, over 21357.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01205, ecapa_loss=0.0002605, whisper_loss=0.09679, over 3941288.14 frames. ], batch size: 87, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:48:01,361 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 11:48:13,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=536870.0, ans=0.5 2024-08-10 11:48:34,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.383e+01 3.065e+01 3.411e+01 3.914e+01 6.071e+01, threshold=6.821e+01, percent-clipped=0.0 2024-08-10 11:48:38,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=536970.0, ans=0.0 2024-08-10 11:48:52,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=537070.0, ans=0.1 2024-08-10 11:49:02,854 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-10 11:49:20,115 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10250, loss[loss=0.1097, beats_loss=0.01145, ecapa_loss=0.0002914, whisper_loss=0.09534, over 22977.00 frames. ], tot_loss[loss=0.1119, beats_loss=0.01203, ecapa_loss=0.000263, whisper_loss=0.09725, over 3936599.62 frames. ], batch size: 91, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:49:20,296 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 11:49:22,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=537270.0, ans=0.125 2024-08-10 11:49:34,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=537270.0, ans=0.125 2024-08-10 11:49:42,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=537370.0, ans=0.125 2024-08-10 11:49:44,822 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 11:49:48,602 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-10 11:50:27,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=537670.0, ans=0.125 2024-08-10 11:50:44,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10300, loss[loss=0.1123, beats_loss=0.009232, ecapa_loss=0.0003146, whisper_loss=0.09989, over 14640.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01205, ecapa_loss=0.0002627, whisper_loss=0.09694, over 3949323.98 frames. ], batch size: 58, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:50:45,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=537770.0, ans=0.95 2024-08-10 11:51:03,920 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 11:51:20,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 3.115e+01 3.523e+01 4.089e+01 1.199e+02, threshold=7.045e+01, percent-clipped=1.0 2024-08-10 11:51:24,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-10 11:51:29,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537970.0, ans=0.125 2024-08-10 11:51:43,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=538070.0, ans=0.125 2024-08-10 11:51:47,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=538170.0, ans=0.125 2024-08-10 11:51:58,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=538170.0, ans=0.125 2024-08-10 11:52:04,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10350, loss[loss=0.1082, beats_loss=0.01347, ecapa_loss=0.0002502, whisper_loss=0.09222, over 22613.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01208, ecapa_loss=0.0002629, whisper_loss=0.097, over 3939290.35 frames. ], batch size: 91, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:52:04,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.65 vs. limit=22.5 2024-08-10 11:52:12,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=538270.0, ans=0.0 2024-08-10 11:52:14,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538270.0, ans=0.1 2024-08-10 11:52:17,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=538270.0, ans=0.0 2024-08-10 11:52:20,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=538370.0, ans=0.125 2024-08-10 11:52:31,621 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 11:53:06,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=538570.0, ans=0.125 2024-08-10 11:53:13,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=538670.0, ans=0.04949747468305833 2024-08-10 11:53:25,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10400, loss[loss=0.1214, beats_loss=0.01277, ecapa_loss=0.0002046, whisper_loss=0.1066, over 23677.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01209, ecapa_loss=0.0002619, whisper_loss=0.09631, over 3917785.41 frames. ], batch size: 88, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:53:56,842 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 11:54:01,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.334e+01 2.892e+01 3.209e+01 3.631e+01 5.476e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 11:54:08,112 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 11:54:20,646 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 11:54:29,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-10 11:54:35,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=539170.0, ans=0.0 2024-08-10 11:54:44,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10450, loss[loss=0.085, beats_loss=0.01381, ecapa_loss=0.0003029, whisper_loss=0.06816, over 15101.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01215, ecapa_loss=0.000261, whisper_loss=0.0949, over 3877122.15 frames. ], batch size: 63, lr: 1.46e-02, grad_scale: 2147483648.0 2024-08-10 11:54:51,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-08-10 11:54:55,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-08-10 11:55:04,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=539370.0, ans=0.1 2024-08-10 11:55:06,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=539370.0, ans=0.125 2024-08-10 11:55:08,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2024-08-10 11:55:12,552 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 11:55:20,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=539470.0, ans=0.125 2024-08-10 11:55:26,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539470.0, ans=0.1 2024-08-10 11:55:40,572 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 11:55:42,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=539570.0, ans=0.125 2024-08-10 11:56:01,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10500, loss[loss=0.1088, beats_loss=0.009523, ecapa_loss=0.0003169, whisper_loss=0.09609, over 14424.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01208, ecapa_loss=0.0002633, whisper_loss=0.09516, over 3861678.81 frames. ], batch size: 57, lr: 1.45e-02, grad_scale: 2147483648.0 2024-08-10 11:56:24,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=539870.0, ans=0.125 2024-08-10 11:56:33,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 3.038e+01 3.459e+01 3.996e+01 6.342e+01, threshold=6.919e+01, percent-clipped=0.0 2024-08-10 11:56:39,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=539970.0, ans=0.0 2024-08-10 11:56:42,162 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 11:56:44,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=540070.0, ans=0.125 2024-08-10 11:56:52,623 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 11:56:55,356 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 11:56:58,083 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-10 11:57:05,509 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-10 11:57:09,147 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10550, loss[loss=0.1157, beats_loss=0.01398, ecapa_loss=0.0002735, whisper_loss=0.099, over 20400.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0121, ecapa_loss=0.0002617, whisper_loss=0.09542, over 3889624.11 frames. ], batch size: 81, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:57:10,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=540270.0, ans=0.125 2024-08-10 11:57:12,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-10 11:57:20,383 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 11:57:27,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-10 11:57:33,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=540370.0, ans=0.125 2024-08-10 11:57:57,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=540570.0, ans=0.125 2024-08-10 11:58:09,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=540670.0, ans=0.2 2024-08-10 11:58:18,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10600, loss[loss=0.1092, beats_loss=0.01431, ecapa_loss=0.0002079, whisper_loss=0.09286, over 22783.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01213, ecapa_loss=0.000262, whisper_loss=0.09621, over 3914325.24 frames. ], batch size: 92, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:58:25,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=540770.0, ans=0.0 2024-08-10 11:58:36,371 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 11:58:45,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-08-10 11:58:46,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=540970.0, ans=0.0 2024-08-10 11:58:49,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.966e+01 3.321e+01 3.773e+01 6.212e+01, threshold=6.641e+01, percent-clipped=0.0 2024-08-10 11:59:04,732 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-10 11:59:08,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=541070.0, ans=0.125 2024-08-10 11:59:12,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-08-10 11:59:16,086 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 11:59:24,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2024-08-10 11:59:25,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10650, loss[loss=0.1209, beats_loss=0.01272, ecapa_loss=0.0002393, whisper_loss=0.1058, over 23703.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01215, ecapa_loss=0.0002582, whisper_loss=0.09629, over 3918237.73 frames. ], batch size: 94, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 11:59:25,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=541270.0, ans=0.0 2024-08-10 11:59:46,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=541370.0, ans=0.125 2024-08-10 12:00:01,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=541470.0, ans=0.125 2024-08-10 12:00:14,126 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 12:00:18,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=541670.0, ans=0.125 2024-08-10 12:00:27,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=541670.0, ans=0.0 2024-08-10 12:00:31,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10700, loss[loss=0.08784, beats_loss=0.01326, ecapa_loss=0.0002505, whisper_loss=0.07207, over 22275.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01211, ecapa_loss=0.0002562, whisper_loss=0.09676, over 3910843.73 frames. ], batch size: 90, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:00:32,873 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 12:00:45,449 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-10 12:00:51,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=541870.0, ans=0.07 2024-08-10 12:00:51,909 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 12:00:56,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=541870.0, ans=0.125 2024-08-10 12:01:03,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+01 3.156e+01 3.555e+01 4.088e+01 6.627e+01, threshold=7.109e+01, percent-clipped=0.0 2024-08-10 12:01:07,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541970.0, ans=0.1 2024-08-10 12:01:07,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=541970.0, ans=0.125 2024-08-10 12:01:15,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=542070.0, ans=0.125 2024-08-10 12:01:18,079 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 32 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 12:01:18,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2024-08-10 12:01:21,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=542070.0, ans=0.125 2024-08-10 12:01:29,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=542170.0, ans=0.125 2024-08-10 12:01:31,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=542170.0, ans=0.2 2024-08-10 12:01:35,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=542170.0, ans=0.025 2024-08-10 12:01:39,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10750, loss[loss=0.1194, beats_loss=0.01395, ecapa_loss=0.0002438, whisper_loss=0.103, over 22058.00 frames. ], tot_loss[loss=0.1114, beats_loss=0.01207, ecapa_loss=0.0002574, whisper_loss=0.09681, over 3919650.43 frames. ], batch size: 89, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:01:58,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=542370.0, ans=0.0 2024-08-10 12:02:04,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=542370.0, ans=0.025 2024-08-10 12:02:06,797 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 12:02:14,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-10 12:02:17,178 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 12:02:24,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2024-08-10 12:02:33,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-10 12:02:45,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10800, loss[loss=0.1351, beats_loss=0.01149, ecapa_loss=0.0002399, whisper_loss=0.1212, over 23823.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.01207, ecapa_loss=0.0002569, whisper_loss=0.09742, over 3919366.64 frames. ], batch size: 92, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:03:02,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-08-10 12:03:17,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+01 3.065e+01 3.589e+01 4.278e+01 6.968e+01, threshold=7.178e+01, percent-clipped=0.0 2024-08-10 12:03:24,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=542970.0, ans=0.2 2024-08-10 12:03:47,749 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 12:03:51,600 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 12:03:53,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10850, loss[loss=0.1202, beats_loss=0.01051, ecapa_loss=0.000305, whisper_loss=0.1066, over 22982.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01207, ecapa_loss=0.0002567, whisper_loss=0.09719, over 3917160.51 frames. ], batch size: 92, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:04:09,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-10 12:04:18,696 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 12:04:18,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=543370.0, ans=0.0 2024-08-10 12:04:22,480 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 12:04:26,686 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 12:04:30,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.59 vs. limit=10.0 2024-08-10 12:04:42,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=543570.0, ans=0.0 2024-08-10 12:04:55,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=543670.0, ans=0.1 2024-08-10 12:05:02,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=543770.0, ans=0.02 2024-08-10 12:05:03,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10900, loss[loss=0.1203, beats_loss=0.01213, ecapa_loss=0.0002941, whisper_loss=0.1052, over 21483.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01204, ecapa_loss=0.0002573, whisper_loss=0.09673, over 3893305.56 frames. ], batch size: 89, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:05:03,504 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.931e+00 2024-08-10 12:05:35,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 3.064e+01 3.469e+01 3.864e+01 6.688e+01, threshold=6.938e+01, percent-clipped=0.0 2024-08-10 12:06:09,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=544170.0, ans=0.125 2024-08-10 12:06:11,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=544170.0, ans=0.0 2024-08-10 12:06:13,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 10950, loss[loss=0.1058, beats_loss=0.01014, ecapa_loss=0.0003185, whisper_loss=0.09245, over 18308.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01206, ecapa_loss=0.0002566, whisper_loss=0.09708, over 3888111.55 frames. ], batch size: 75, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:06:25,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=544370.0, ans=0.1 2024-08-10 12:06:29,456 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-10 12:06:39,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=544470.0, ans=0.1 2024-08-10 12:06:40,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-10 12:07:05,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=544670.0, ans=0.015 2024-08-10 12:07:14,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=544670.0, ans=0.2 2024-08-10 12:07:17,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=544670.0, ans=0.125 2024-08-10 12:07:19,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11000, loss[loss=0.1229, beats_loss=0.008036, ecapa_loss=0.00034, whisper_loss=0.1115, over 16320.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01196, ecapa_loss=0.0002593, whisper_loss=0.09709, over 3862133.15 frames. ], batch size: 63, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:07:38,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=544870.0, ans=0.0 2024-08-10 12:07:38,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=544870.0, ans=0.125 2024-08-10 12:07:44,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=544870.0, ans=0.125 2024-08-10 12:07:50,196 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.281e+01 2.903e+01 3.309e+01 3.802e+01 5.297e+01, threshold=6.618e+01, percent-clipped=0.0 2024-08-10 12:07:50,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-10 12:07:52,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-10 12:08:01,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=545070.0, ans=0.125 2024-08-10 12:08:25,652 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11050, loss[loss=0.1123, beats_loss=0.01164, ecapa_loss=0.0002491, whisper_loss=0.09821, over 22364.00 frames. ], tot_loss[loss=0.1117, beats_loss=0.01191, ecapa_loss=0.0002608, whisper_loss=0.09721, over 3900030.87 frames. ], batch size: 88, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:08:32,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.54 vs. limit=15.0 2024-08-10 12:08:35,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=545270.0, ans=0.125 2024-08-10 12:08:37,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=545270.0, ans=0.125 2024-08-10 12:08:57,424 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 12:09:16,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=545570.0, ans=0.125 2024-08-10 12:09:22,447 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 12:09:31,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11100, loss[loss=0.1009, beats_loss=0.01178, ecapa_loss=0.0002969, whisper_loss=0.08618, over 21072.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01194, ecapa_loss=0.0002606, whisper_loss=0.09706, over 3920270.11 frames. ], batch size: 88, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:09:35,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545770.0, ans=0.1 2024-08-10 12:09:39,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=545770.0, ans=0.125 2024-08-10 12:09:52,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=545870.0, ans=0.125 2024-08-10 12:09:59,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=22.5 2024-08-10 12:10:02,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 3.025e+01 3.487e+01 4.357e+01 7.811e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:10:07,793 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 28 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 12:10:18,814 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 12:10:34,619 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 12:10:38,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11150, loss[loss=0.1048, beats_loss=0.01279, ecapa_loss=0.0002532, whisper_loss=0.08945, over 20908.00 frames. ], tot_loss[loss=0.112, beats_loss=0.0119, ecapa_loss=0.0002599, whisper_loss=0.0975, over 3909352.16 frames. ], batch size: 81, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:10:38,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=546270.0, ans=0.125 2024-08-10 12:10:42,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=546270.0, ans=0.125 2024-08-10 12:10:59,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=12.0 2024-08-10 12:11:00,012 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 12:11:05,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=546470.0, ans=0.0 2024-08-10 12:11:08,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=546470.0, ans=0.125 2024-08-10 12:11:11,673 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 12:11:25,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=546570.0, ans=0.2 2024-08-10 12:11:26,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=546570.0, ans=0.125 2024-08-10 12:11:28,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=546570.0, ans=0.125 2024-08-10 12:11:31,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=546670.0, ans=0.1 2024-08-10 12:11:31,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=546670.0, ans=0.0 2024-08-10 12:11:37,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-08-10 12:11:42,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=546670.0, ans=0.0 2024-08-10 12:11:44,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11200, loss[loss=0.1157, beats_loss=0.01052, ecapa_loss=0.0002769, whisper_loss=0.1024, over 17555.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01185, ecapa_loss=0.0002618, whisper_loss=0.09734, over 3922675.34 frames. ], batch size: 71, lr: 1.45e-02, grad_scale: 4294967296.0 2024-08-10 12:12:13,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=546970.0, ans=0.125 2024-08-10 12:12:15,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.205e+01 3.126e+01 3.422e+01 3.938e+01 7.786e+01, threshold=6.843e+01, percent-clipped=1.0 2024-08-10 12:12:15,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=546970.0, ans=0.0 2024-08-10 12:12:39,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547170.0, ans=0.125 2024-08-10 12:12:40,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=547170.0, ans=0.125 2024-08-10 12:12:46,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=547170.0, ans=0.0 2024-08-10 12:12:51,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11250, loss[loss=0.08777, beats_loss=0.01387, ecapa_loss=0.0002721, whisper_loss=0.07118, over 14269.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01191, ecapa_loss=0.0002618, whisper_loss=0.09699, over 3893705.42 frames. ], batch size: 61, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:13:07,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=12.0 2024-08-10 12:13:30,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547470.0, ans=0.1 2024-08-10 12:13:42,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=15.0 2024-08-10 12:13:58,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11300, loss[loss=0.1217, beats_loss=0.01335, ecapa_loss=0.0002084, whisper_loss=0.1062, over 21874.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01192, ecapa_loss=0.0002613, whisper_loss=0.097, over 3903493.22 frames. ], batch size: 85, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:14:00,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=547770.0, ans=0.1 2024-08-10 12:14:11,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=547870.0, ans=0.125 2024-08-10 12:14:22,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=547870.0, ans=0.125 2024-08-10 12:14:30,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.381e+01 3.068e+01 3.483e+01 4.119e+01 9.369e+01, threshold=6.966e+01, percent-clipped=1.0 2024-08-10 12:14:37,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547970.0, ans=0.1 2024-08-10 12:14:45,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=548070.0, ans=0.0 2024-08-10 12:14:49,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=548070.0, ans=0.2 2024-08-10 12:14:52,935 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 12:14:54,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=548170.0, ans=0.035 2024-08-10 12:15:05,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11350, loss[loss=0.1332, beats_loss=0.00958, ecapa_loss=0.0002929, whisper_loss=0.1207, over 20901.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01186, ecapa_loss=0.0002598, whisper_loss=0.09688, over 3895760.57 frames. ], batch size: 80, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:15:07,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=548270.0, ans=0.125 2024-08-10 12:15:08,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=548270.0, ans=0.125 2024-08-10 12:15:15,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=548270.0, ans=0.2 2024-08-10 12:15:16,048 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 12:15:36,030 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.076e+00 2024-08-10 12:15:40,380 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-10 12:15:52,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=548570.0, ans=0.2 2024-08-10 12:15:55,927 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-10 12:15:56,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2024-08-10 12:16:10,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=548770.0, ans=0.125 2024-08-10 12:16:11,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11400, loss[loss=0.0929, beats_loss=0.01323, ecapa_loss=0.0003065, whisper_loss=0.07661, over 19383.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01186, ecapa_loss=0.0002597, whisper_loss=0.09707, over 3892695.71 frames. ], batch size: 84, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:16:31,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548870.0, ans=0.1 2024-08-10 12:16:35,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-10 12:16:40,136 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 12:16:42,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+01 2.940e+01 3.301e+01 3.929e+01 5.377e+01, threshold=6.601e+01, percent-clipped=0.0 2024-08-10 12:16:51,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=549070.0, ans=0.125 2024-08-10 12:17:06,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=549170.0, ans=0.125 2024-08-10 12:17:07,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2024-08-10 12:17:15,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=549170.0, ans=0.0 2024-08-10 12:17:16,947 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 12:17:18,762 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11450, loss[loss=0.1334, beats_loss=0.009611, ecapa_loss=0.0002744, whisper_loss=0.121, over 21696.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01195, ecapa_loss=0.0002595, whisper_loss=0.09669, over 3894297.57 frames. ], batch size: 87, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:17:23,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=549270.0, ans=0.2 2024-08-10 12:17:40,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=549370.0, ans=0.0 2024-08-10 12:17:53,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=549470.0, ans=0.0 2024-08-10 12:17:59,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=549570.0, ans=0.0 2024-08-10 12:18:05,613 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-10 12:18:12,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=549670.0, ans=0.0 2024-08-10 12:18:19,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=549670.0, ans=0.2 2024-08-10 12:18:20,739 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 12:18:21,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=549670.0, ans=0.2 2024-08-10 12:18:24,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=549670.0, ans=0.0 2024-08-10 12:18:26,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11500, loss[loss=0.0946, beats_loss=0.01266, ecapa_loss=0.0002374, whisper_loss=0.07957, over 18960.00 frames. ], tot_loss[loss=0.111, beats_loss=0.012, ecapa_loss=0.0002592, whisper_loss=0.09643, over 3920487.63 frames. ], batch size: 75, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:18:32,661 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 12:18:50,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549870.0, ans=0.125 2024-08-10 12:18:50,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549870.0, ans=0.125 2024-08-10 12:18:55,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2024-08-10 12:18:56,238 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.942e+01 3.473e+01 3.989e+01 7.170e+01, threshold=6.945e+01, percent-clipped=1.0 2024-08-10 12:18:59,011 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 12:19:04,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=550070.0, ans=10.0 2024-08-10 12:19:06,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=550070.0, ans=0.125 2024-08-10 12:19:11,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550070.0, ans=0.1 2024-08-10 12:19:13,697 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 12:19:22,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-10 12:19:23,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=550170.0, ans=0.0 2024-08-10 12:19:23,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=550170.0, ans=0.0 2024-08-10 12:19:28,693 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 12:19:29,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2024-08-10 12:19:32,322 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11550, loss[loss=0.1063, beats_loss=0.009879, ecapa_loss=0.0002843, whisper_loss=0.0936, over 16910.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01203, ecapa_loss=0.0002599, whisper_loss=0.09633, over 3897867.23 frames. ], batch size: 66, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:19:52,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=550370.0, ans=0.125 2024-08-10 12:20:01,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=550470.0, ans=0.125 2024-08-10 12:20:18,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=550570.0, ans=0.2 2024-08-10 12:20:31,745 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 12:20:32,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.81 vs. limit=15.0 2024-08-10 12:20:34,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=550670.0, ans=0.125 2024-08-10 12:20:38,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11600, loss[loss=0.1241, beats_loss=0.008748, ecapa_loss=0.0003279, whisper_loss=0.1121, over 15674.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.0121, ecapa_loss=0.0002587, whisper_loss=0.09555, over 3926727.08 frames. ], batch size: 62, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:20:38,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550770.0, ans=0.1 2024-08-10 12:20:42,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550770.0, ans=0.1 2024-08-10 12:20:45,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550770.0, ans=0.125 2024-08-10 12:20:53,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=550870.0, ans=0.035 2024-08-10 12:21:08,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+01 3.001e+01 3.473e+01 4.016e+01 7.053e+01, threshold=6.947e+01, percent-clipped=1.0 2024-08-10 12:21:13,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=550970.0, ans=0.125 2024-08-10 12:21:29,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=551070.0, ans=0.0 2024-08-10 12:21:39,256 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-10 12:21:42,370 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.690e-02 2024-08-10 12:21:46,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11650, loss[loss=0.1195, beats_loss=0.0115, ecapa_loss=0.000231, whisper_loss=0.1057, over 23276.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01203, ecapa_loss=0.0002572, whisper_loss=0.09629, over 3889593.07 frames. ], batch size: 91, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:21:53,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=551270.0, ans=0.0 2024-08-10 12:22:08,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=12.0 2024-08-10 12:22:31,958 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 12:22:38,953 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 12:22:42,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551670.0, ans=0.1 2024-08-10 12:22:57,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11700, loss[loss=0.1142, beats_loss=0.01455, ecapa_loss=0.0002146, whisper_loss=0.09746, over 23702.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01208, ecapa_loss=0.0002568, whisper_loss=0.09607, over 3925742.00 frames. ], batch size: 92, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:22:59,234 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 12:23:10,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=551770.0, ans=0.1 2024-08-10 12:23:12,775 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 12:23:30,555 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 12:23:33,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 3.233e+01 3.487e+01 4.046e+01 6.995e+01, threshold=6.974e+01, percent-clipped=1.0 2024-08-10 12:23:37,867 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 12:23:42,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=552070.0, ans=0.125 2024-08-10 12:23:54,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552070.0, ans=0.125 2024-08-10 12:23:57,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552170.0, ans=0.1 2024-08-10 12:24:13,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11750, loss[loss=0.09758, beats_loss=0.01765, ecapa_loss=0.0001697, whisper_loss=0.07823, over 20028.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01219, ecapa_loss=0.0002559, whisper_loss=0.09562, over 3942954.17 frames. ], batch size: 78, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:24:14,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-10 12:24:17,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=552270.0, ans=0.2 2024-08-10 12:24:56,942 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 12:24:57,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2024-08-10 12:25:16,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=552670.0, ans=0.0 2024-08-10 12:25:20,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=552670.0, ans=0.125 2024-08-10 12:25:28,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=22.5 2024-08-10 12:25:29,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11800, loss[loss=0.0965, beats_loss=0.01404, ecapa_loss=0.0002418, whisper_loss=0.08005, over 16369.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01212, ecapa_loss=0.0002566, whisper_loss=0.09611, over 3951623.76 frames. ], batch size: 67, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:25:30,937 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 12:25:41,107 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 12:25:56,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=552870.0, ans=0.125 2024-08-10 12:25:58,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=552970.0, ans=0.0 2024-08-10 12:26:04,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.876e+01 3.467e+01 4.028e+01 7.288e+01, threshold=6.933e+01, percent-clipped=1.0 2024-08-10 12:26:18,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=553070.0, ans=0.0 2024-08-10 12:26:25,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=553070.0, ans=0.125 2024-08-10 12:26:27,004 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 12:26:28,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553170.0, ans=0.1 2024-08-10 12:26:35,999 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 12:26:44,141 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11850, loss[loss=0.08599, beats_loss=0.01445, ecapa_loss=0.0002604, whisper_loss=0.06894, over 16729.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01214, ecapa_loss=0.0002584, whisper_loss=0.09567, over 3940251.06 frames. ], batch size: 68, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:26:51,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=553270.0, ans=0.1 2024-08-10 12:26:56,733 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 12:27:02,558 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 34 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 12:27:04,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=12.0 2024-08-10 12:27:05,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=553370.0, ans=0.0 2024-08-10 12:27:25,560 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-10 12:27:37,216 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 12:27:48,992 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 12:27:57,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11900, loss[loss=0.1046, beats_loss=0.01328, ecapa_loss=0.0003097, whisper_loss=0.08825, over 19154.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01226, ecapa_loss=0.000259, whisper_loss=0.09585, over 3943666.91 frames. ], batch size: 81, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:28:02,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-10 12:28:19,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=553870.0, ans=0.1 2024-08-10 12:28:30,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 3.028e+01 3.380e+01 3.794e+01 5.730e+01, threshold=6.759e+01, percent-clipped=0.0 2024-08-10 12:28:35,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=553970.0, ans=0.2 2024-08-10 12:28:41,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554070.0, ans=0.1 2024-08-10 12:28:42,537 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 12:28:49,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-08-10 12:29:04,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=554170.0, ans=0.125 2024-08-10 12:29:10,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 11950, loss[loss=0.1032, beats_loss=0.0148, ecapa_loss=0.0002588, whisper_loss=0.0858, over 21680.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01221, ecapa_loss=0.0002572, whisper_loss=0.0952, over 3898973.51 frames. ], batch size: 89, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:29:20,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=554270.0, ans=0.0 2024-08-10 12:29:24,334 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 12:29:24,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=554370.0, ans=10.0 2024-08-10 12:29:30,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554370.0, ans=0.1 2024-08-10 12:29:36,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=554370.0, ans=0.07 2024-08-10 12:29:46,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554470.0, ans=0.125 2024-08-10 12:30:05,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=554570.0, ans=0.0 2024-08-10 12:30:05,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-08-10 12:30:21,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=554770.0, ans=0.0 2024-08-10 12:30:22,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12000, loss[loss=0.103, beats_loss=0.01409, ecapa_loss=0.0002071, whisper_loss=0.08682, over 21413.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01221, ecapa_loss=0.0002559, whisper_loss=0.09526, over 3906208.15 frames. ], batch size: 87, lr: 1.44e-02, grad_scale: 4294967296.0 2024-08-10 12:30:22,880 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 12:30:59,936 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on ASR_libri: loss=0.2637, beats_loss=0, ecapa_loss=0.0007919, whisper_loss=0.2558, over 922467.00 frames. 2024-08-10 12:31:17,178 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on SV_voxceleb1: loss=0.006895, beats_loss=0, ecapa_loss=0.0006895, whisper_loss=0, over 939242.00 frames. 2024-08-10 12:33:04,196 INFO [train_multi_KD3.py:1149] (0/4) Epoch 4, validation on AT_audioset: loss=0.02758, beats_loss=0.02758, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 12:33:04,201 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 12:33:11,718 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 12:33:25,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=554870.0, ans=0.125 2024-08-10 12:33:26,491 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 12:33:39,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.407e+01 3.069e+01 3.344e+01 4.078e+01 6.277e+01, threshold=6.688e+01, percent-clipped=0.0 2024-08-10 12:34:10,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555170.0, ans=0.1 2024-08-10 12:34:16,443 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-10 12:34:24,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12050, loss[loss=0.1226, beats_loss=0.01141, ecapa_loss=0.0002698, whisper_loss=0.1085, over 21276.00 frames. ], tot_loss[loss=0.11, beats_loss=0.0122, ecapa_loss=0.0002563, whisper_loss=0.0952, over 3879278.17 frames. ], batch size: 88, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:34:40,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=555370.0, ans=0.125 2024-08-10 12:34:46,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2024-08-10 12:35:00,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=15.0 2024-08-10 12:35:03,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=555470.0, ans=0.125 2024-08-10 12:35:05,337 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 12:35:09,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2024-08-10 12:35:33,046 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 12:35:49,823 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12100, loss[loss=0.126, beats_loss=0.009155, ecapa_loss=0.0002936, whisper_loss=0.1139, over 21894.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01211, ecapa_loss=0.00026, whisper_loss=0.09577, over 3871988.32 frames. ], batch size: 87, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:35:51,349 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 12:36:05,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-08-10 12:36:10,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-08-10 12:36:12,451 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 12:36:30,095 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.425e+01 2.918e+01 3.194e+01 3.735e+01 7.690e+01, threshold=6.389e+01, percent-clipped=2.0 2024-08-10 12:36:47,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=556070.0, ans=0.125 2024-08-10 12:36:51,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=556070.0, ans=0.125 2024-08-10 12:37:14,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-10 12:37:15,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12150, loss[loss=0.09326, beats_loss=0.01374, ecapa_loss=0.0002028, whisper_loss=0.07749, over 17694.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01207, ecapa_loss=0.0002596, whisper_loss=0.09622, over 3898527.75 frames. ], batch size: 69, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:37:31,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-08-10 12:37:35,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=556370.0, ans=0.025 2024-08-10 12:38:03,073 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 12:38:24,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=556670.0, ans=0.125 2024-08-10 12:38:31,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=556670.0, ans=0.125 2024-08-10 12:38:34,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12200, loss[loss=0.1154, beats_loss=0.01196, ecapa_loss=0.0002188, whisper_loss=0.1013, over 17351.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01202, ecapa_loss=0.0002595, whisper_loss=0.09629, over 3868451.93 frames. ], batch size: 66, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:38:34,451 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 12:39:10,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.794e+01 3.177e+01 3.515e+01 6.137e+01, threshold=6.354e+01, percent-clipped=0.0 2024-08-10 12:39:17,106 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 12:39:26,207 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 12:39:27,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=557070.0, ans=0.125 2024-08-10 12:39:34,533 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 12:39:54,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12250, loss[loss=0.106, beats_loss=0.0116, ecapa_loss=0.0002562, whisper_loss=0.09181, over 22022.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01199, ecapa_loss=0.0002606, whisper_loss=0.0959, over 3864599.82 frames. ], batch size: 90, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:39:56,793 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 12:40:17,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=557370.0, ans=0.0 2024-08-10 12:40:42,715 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-10 12:41:04,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=557670.0, ans=0.0 2024-08-10 12:41:14,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12300, loss[loss=0.06075, beats_loss=0.01707, ecapa_loss=0.0002238, whisper_loss=0.04143, over 18100.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01205, ecapa_loss=0.0002608, whisper_loss=0.09542, over 3871731.45 frames. ], batch size: 78, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:41:34,918 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 12:41:45,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=557970.0, ans=0.04949747468305833 2024-08-10 12:41:46,753 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 12:41:49,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 3.037e+01 3.524e+01 3.995e+01 1.053e+02, threshold=7.048e+01, percent-clipped=4.0 2024-08-10 12:42:00,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.04 vs. limit=15.0 2024-08-10 12:42:22,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=558170.0, ans=0.2 2024-08-10 12:42:27,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=22.5 2024-08-10 12:42:33,007 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12350, loss[loss=0.1024, beats_loss=0.01237, ecapa_loss=0.0002831, whisper_loss=0.08716, over 21005.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01206, ecapa_loss=0.0002625, whisper_loss=0.09481, over 3885219.57 frames. ], batch size: 87, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:42:51,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=558370.0, ans=0.0 2024-08-10 12:43:13,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=558470.0, ans=0.125 2024-08-10 12:43:23,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=558570.0, ans=0.125 2024-08-10 12:43:48,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=558670.0, ans=0.125 2024-08-10 12:43:50,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=558670.0, ans=0.04949747468305833 2024-08-10 12:44:01,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12400, loss[loss=0.08998, beats_loss=0.014, ecapa_loss=0.000248, whisper_loss=0.0735, over 22076.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.012, ecapa_loss=0.0002614, whisper_loss=0.09527, over 3887376.67 frames. ], batch size: 90, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:44:03,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=558770.0, ans=0.125 2024-08-10 12:44:04,687 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 12:44:13,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=558770.0, ans=15.0 2024-08-10 12:44:21,970 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-10 12:44:25,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=558870.0, ans=0.0 2024-08-10 12:44:29,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=558870.0, ans=0.0 2024-08-10 12:44:41,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.193e+01 2.951e+01 3.310e+01 3.895e+01 5.650e+01, threshold=6.619e+01, percent-clipped=0.0 2024-08-10 12:44:48,021 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 12:45:02,300 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 12:45:26,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12450, loss[loss=0.0949, beats_loss=0.01315, ecapa_loss=0.0002401, whisper_loss=0.07935, over 15649.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01195, ecapa_loss=0.0002607, whisper_loss=0.09565, over 3878901.52 frames. ], batch size: 62, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:45:38,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=559270.0, ans=0.07 2024-08-10 12:45:58,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-10 12:46:06,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=559470.0, ans=0.0 2024-08-10 12:46:08,310 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 12:46:38,024 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 12:46:45,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12500, loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.0002899, whisper_loss=0.08935, over 18415.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01189, ecapa_loss=0.0002613, whisper_loss=0.09598, over 3882091.71 frames. ], batch size: 75, lr: 1.43e-02, grad_scale: 4294967296.0 2024-08-10 12:46:53,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=559770.0, ans=0.125 2024-08-10 12:47:01,061 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 12:47:01,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=559870.0, ans=0.0 2024-08-10 12:47:07,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=559870.0, ans=0.0 2024-08-10 12:47:25,139 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-56000.pt 2024-08-10 12:47:28,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 3.210e+01 3.616e+01 4.037e+01 8.521e+01, threshold=7.231e+01, percent-clipped=2.0 2024-08-10 12:47:39,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=560070.0, ans=0.125 2024-08-10 12:47:39,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=560070.0, ans=0.0 2024-08-10 12:47:54,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=560170.0, ans=0.04949747468305833 2024-08-10 12:47:56,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=560170.0, ans=0.125 2024-08-10 12:48:12,631 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12550, loss[loss=0.1208, beats_loss=0.01028, ecapa_loss=0.0002723, whisper_loss=0.1078, over 21376.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01197, ecapa_loss=0.0002619, whisper_loss=0.09579, over 3919060.25 frames. ], batch size: 87, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:48:14,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=560270.0, ans=0.2 2024-08-10 12:48:22,152 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 12:48:29,654 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 12:48:41,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=560370.0, ans=0.0 2024-08-10 12:48:48,284 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-10 12:48:52,436 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 12:49:03,076 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 12:49:03,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=560570.0, ans=0.0 2024-08-10 12:49:19,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=560670.0, ans=0.125 2024-08-10 12:49:20,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-08-10 12:49:29,709 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12600, loss[loss=0.1128, beats_loss=0.01096, ecapa_loss=0.0002281, whisper_loss=0.09959, over 15918.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01194, ecapa_loss=0.0002602, whisper_loss=0.09642, over 3904297.14 frames. ], batch size: 60, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:49:53,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=560870.0, ans=0.125 2024-08-10 12:50:05,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.53 vs. limit=12.0 2024-08-10 12:50:06,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.385e+01 3.110e+01 3.572e+01 4.096e+01 7.155e+01, threshold=7.143e+01, percent-clipped=0.0 2024-08-10 12:50:20,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=561070.0, ans=0.2 2024-08-10 12:50:21,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=561070.0, ans=0.125 2024-08-10 12:50:24,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=561070.0, ans=0.07 2024-08-10 12:50:27,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561070.0, ans=0.1 2024-08-10 12:50:30,415 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 12:50:38,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=561170.0, ans=0.125 2024-08-10 12:50:42,407 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 12:50:43,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-10 12:50:46,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12650, loss[loss=0.1108, beats_loss=0.01549, ecapa_loss=0.0002208, whisper_loss=0.09314, over 22779.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01198, ecapa_loss=0.0002592, whisper_loss=0.09653, over 3881562.07 frames. ], batch size: 94, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:51:12,927 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 12:51:20,274 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 12:51:22,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-08-10 12:51:38,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-08-10 12:51:41,517 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 12:52:07,739 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-10 12:52:08,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12700, loss[loss=0.1073, beats_loss=0.009918, ecapa_loss=0.000307, whisper_loss=0.09432, over 19517.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01191, ecapa_loss=0.0002603, whisper_loss=0.09626, over 3857593.52 frames. ], batch size: 82, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:52:38,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=561870.0, ans=0.1 2024-08-10 12:52:38,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=561870.0, ans=0.125 2024-08-10 12:52:44,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=561970.0, ans=0.1 2024-08-10 12:52:44,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.862e+01 3.101e+01 3.673e+01 6.463e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-10 12:52:53,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2024-08-10 12:53:17,388 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.134e-02 2024-08-10 12:53:17,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-10 12:53:26,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12750, loss[loss=0.1246, beats_loss=0.01059, ecapa_loss=0.0002872, whisper_loss=0.1111, over 23584.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01197, ecapa_loss=0.0002609, whisper_loss=0.0961, over 3875816.42 frames. ], batch size: 92, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:53:32,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=562270.0, ans=0.0 2024-08-10 12:53:57,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562470.0, ans=0.1 2024-08-10 12:53:59,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=562470.0, ans=0.09899494936611666 2024-08-10 12:54:02,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-08-10 12:54:03,850 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 12:54:31,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562670.0, ans=0.1 2024-08-10 12:54:42,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12800, loss[loss=0.1154, beats_loss=0.009611, ecapa_loss=0.0002782, whisper_loss=0.103, over 15989.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01204, ecapa_loss=0.0002614, whisper_loss=0.09625, over 3913256.00 frames. ], batch size: 62, lr: 1.43e-02, grad_scale: 8589934592.0 2024-08-10 12:54:42,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=562770.0, ans=0.125 2024-08-10 12:55:07,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562870.0, ans=0.125 2024-08-10 12:55:13,870 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 12:55:17,465 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.950e+01 3.581e+01 4.034e+01 6.155e+01, threshold=7.162e+01, percent-clipped=0.0 2024-08-10 12:55:19,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=562970.0, ans=0.125 2024-08-10 12:55:21,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=562970.0, ans=0.125 2024-08-10 12:55:25,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=562970.0, ans=0.125 2024-08-10 12:55:36,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=563070.0, ans=0.125 2024-08-10 12:55:38,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=563070.0, ans=0.125 2024-08-10 12:55:39,701 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 12:55:55,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12850, loss[loss=0.1159, beats_loss=0.0134, ecapa_loss=0.0002494, whisper_loss=0.1, over 14301.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01206, ecapa_loss=0.000262, whisper_loss=0.09618, over 3898133.23 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:56:18,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563370.0, ans=0.0 2024-08-10 12:56:28,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=563470.0, ans=0.125 2024-08-10 12:56:36,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-08-10 12:56:41,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=563570.0, ans=0.125 2024-08-10 12:56:42,928 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-10 12:56:57,949 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 12:57:01,834 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-10 12:57:04,704 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 12:57:05,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12900, loss[loss=0.1057, beats_loss=0.01183, ecapa_loss=0.0002924, whisper_loss=0.09096, over 19047.00 frames. ], tot_loss[loss=0.11, beats_loss=0.0121, ecapa_loss=0.0002624, whisper_loss=0.0953, over 3875508.56 frames. ], batch size: 78, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:57:07,311 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 12:57:09,968 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 12:57:12,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=563770.0, ans=0.125 2024-08-10 12:57:16,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.52 vs. limit=22.5 2024-08-10 12:57:19,154 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 12:57:23,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=563870.0, ans=0.0 2024-08-10 12:57:32,269 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-10 12:57:38,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.394e+01 2.781e+01 3.198e+01 3.852e+01 6.418e+01, threshold=6.396e+01, percent-clipped=0.0 2024-08-10 12:57:41,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=563970.0, ans=0.1 2024-08-10 12:57:46,833 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.142e-01 2024-08-10 12:57:55,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=564070.0, ans=0.125 2024-08-10 12:57:58,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-08-10 12:58:05,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=564170.0, ans=0.0 2024-08-10 12:58:15,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 12950, loss[loss=0.1265, beats_loss=0.008783, ecapa_loss=0.0002965, whisper_loss=0.1148, over 15971.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01204, ecapa_loss=0.0002609, whisper_loss=0.09623, over 3870856.85 frames. ], batch size: 62, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:58:26,139 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 12:58:30,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564370.0, ans=0.0 2024-08-10 12:58:39,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=564370.0, ans=0.07 2024-08-10 12:58:49,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=564470.0, ans=0.0 2024-08-10 12:58:50,351 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 12:58:54,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=564470.0, ans=0.125 2024-08-10 12:59:00,789 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 12:59:06,120 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 12:59:07,811 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-10 12:59:23,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13000, loss[loss=0.1094, beats_loss=0.01213, ecapa_loss=0.0002146, whisper_loss=0.09508, over 23223.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.0121, ecapa_loss=0.0002608, whisper_loss=0.09556, over 3856845.50 frames. ], batch size: 93, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 12:59:54,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.945e+01 3.366e+01 4.220e+01 5.870e+01, threshold=6.733e+01, percent-clipped=0.0 2024-08-10 12:59:56,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=564970.0, ans=0.1 2024-08-10 13:00:02,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.22 vs. limit=15.0 2024-08-10 13:00:32,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13050, loss[loss=0.1273, beats_loss=0.01087, ecapa_loss=0.0002455, whisper_loss=0.114, over 23058.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01204, ecapa_loss=0.0002593, whisper_loss=0.09599, over 3868318.33 frames. ], batch size: 89, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:00:35,564 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 18 from LS+wenet, 28 from Vox, 47 fro AS 2024-08-10 13:00:42,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565270.0, ans=0.1 2024-08-10 13:00:55,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-08-10 13:01:14,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=565470.0, ans=0.125 2024-08-10 13:01:32,480 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-10 13:01:36,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2024-08-10 13:01:48,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13100, loss[loss=0.1035, beats_loss=0.01112, ecapa_loss=0.0002335, whisper_loss=0.09007, over 21028.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01211, ecapa_loss=0.0002585, whisper_loss=0.09534, over 3879034.68 frames. ], batch size: 81, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:01:51,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=565770.0, ans=0.125 2024-08-10 13:02:00,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2024-08-10 13:02:05,184 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-10 13:02:08,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=565870.0, ans=0.04949747468305833 2024-08-10 13:02:18,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=565870.0, ans=0.125 2024-08-10 13:02:21,258 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 13:02:26,203 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 3.014e+01 3.400e+01 3.856e+01 6.675e+01, threshold=6.801e+01, percent-clipped=0.0 2024-08-10 13:02:27,858 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 12 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-10 13:02:42,161 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 13:02:57,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-08-10 13:03:04,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=566170.0, ans=0.125 2024-08-10 13:03:09,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-10 13:03:10,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13150, loss[loss=0.1188, beats_loss=0.01563, ecapa_loss=0.0001663, whisper_loss=0.1015, over 15673.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01216, ecapa_loss=0.0002563, whisper_loss=0.09582, over 3901815.83 frames. ], batch size: 59, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:03:12,286 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 13:03:12,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=566270.0, ans=0.2 2024-08-10 13:03:19,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566270.0, ans=0.1 2024-08-10 13:03:33,825 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 13:03:34,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=566370.0, ans=0.0 2024-08-10 13:03:38,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=566370.0, ans=0.125 2024-08-10 13:04:12,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=566670.0, ans=0.125 2024-08-10 13:04:13,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-10 13:04:25,056 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-10 13:04:28,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=566670.0, ans=0.125 2024-08-10 13:04:28,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566670.0, ans=0.125 2024-08-10 13:04:31,684 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13200, loss[loss=0.08438, beats_loss=0.01665, ecapa_loss=0.0001587, whisper_loss=0.06614, over 20562.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01208, ecapa_loss=0.0002567, whisper_loss=0.0963, over 3902934.27 frames. ], batch size: 79, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:04:33,559 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-10 13:04:40,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566770.0, ans=0.1 2024-08-10 13:04:47,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.52 vs. limit=10.0 2024-08-10 13:04:50,124 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 13:05:07,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.868e+01 3.486e+01 3.850e+01 5.808e+01, threshold=6.972e+01, percent-clipped=0.0 2024-08-10 13:05:17,207 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 13:05:25,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=567070.0, ans=0.2 2024-08-10 13:05:30,262 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:05:50,054 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13250, loss[loss=0.1171, beats_loss=0.0116, ecapa_loss=0.0002674, whisper_loss=0.1028, over 22958.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01202, ecapa_loss=0.0002567, whisper_loss=0.09656, over 3887965.65 frames. ], batch size: 92, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:06:06,422 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 8 from Vox, 29 fro AS 2024-08-10 13:06:14,030 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 37 from Vox, 27 fro AS 2024-08-10 13:06:44,277 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 13:06:55,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=567670.0, ans=0.2 2024-08-10 13:07:11,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13300, loss[loss=0.09346, beats_loss=0.01347, ecapa_loss=0.0002743, whisper_loss=0.07725, over 20340.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01203, ecapa_loss=0.0002575, whisper_loss=0.09614, over 3879584.85 frames. ], batch size: 87, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:07:22,055 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 13:07:29,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=567870.0, ans=0.125 2024-08-10 13:07:29,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=567870.0, ans=0.125 2024-08-10 13:07:34,622 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 13:07:36,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=567870.0, ans=0.04949747468305833 2024-08-10 13:07:40,538 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 13:07:45,587 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 13:07:48,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 3.028e+01 3.547e+01 3.970e+01 7.425e+01, threshold=7.095e+01, percent-clipped=1.0 2024-08-10 13:08:01,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568070.0, ans=0.1 2024-08-10 13:08:08,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=568070.0, ans=0.125 2024-08-10 13:08:17,802 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 13:08:22,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-10 13:08:25,377 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 13:08:25,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=568170.0, ans=0.025 2024-08-10 13:08:31,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13350, loss[loss=0.1186, beats_loss=0.0124, ecapa_loss=0.0002578, whisper_loss=0.1036, over 19127.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01204, ecapa_loss=0.000256, whisper_loss=0.09649, over 3885132.87 frames. ], batch size: 77, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:08:40,401 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:08:50,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-10 13:09:11,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-08-10 13:09:13,517 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-10 13:09:19,726 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-10 13:09:24,751 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 11 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 13:09:28,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2024-08-10 13:09:30,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2024-08-10 13:09:46,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13400, loss[loss=0.1248, beats_loss=0.01375, ecapa_loss=0.0002174, whisper_loss=0.1089, over 16916.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01194, ecapa_loss=0.0002577, whisper_loss=0.0966, over 3894052.72 frames. ], batch size: 65, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:09:52,109 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 13:10:18,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.352e+01 2.910e+01 3.380e+01 3.958e+01 6.126e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-10 13:10:27,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=569070.0, ans=0.125 2024-08-10 13:10:34,843 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-10 13:10:40,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=569070.0, ans=0.125 2024-08-10 13:10:45,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=569170.0, ans=0.125 2024-08-10 13:10:51,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=569170.0, ans=0.07 2024-08-10 13:10:56,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13450, loss[loss=0.1116, beats_loss=0.008821, ecapa_loss=0.0003083, whisper_loss=0.09973, over 18983.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01201, ecapa_loss=0.0002562, whisper_loss=0.09645, over 3904668.10 frames. ], batch size: 81, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:10:59,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=569270.0, ans=0.0 2024-08-10 13:11:03,706 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 13:11:06,379 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 13:11:07,778 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 13:11:19,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-10 13:11:25,526 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 13:11:40,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=569570.0, ans=0.125 2024-08-10 13:11:55,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=569670.0, ans=0.125 2024-08-10 13:12:04,052 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13500, loss[loss=0.1044, beats_loss=0.01007, ecapa_loss=0.0003308, whisper_loss=0.09107, over 18528.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01196, ecapa_loss=0.0002579, whisper_loss=0.09671, over 3907774.14 frames. ], batch size: 76, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:12:11,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=569770.0, ans=0.125 2024-08-10 13:12:22,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=569870.0, ans=0.2 2024-08-10 13:12:35,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+01 2.996e+01 3.434e+01 4.154e+01 6.721e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 13:12:54,664 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 13:13:04,708 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 13:13:04,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=570170.0, ans=0.04949747468305833 2024-08-10 13:13:11,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13550, loss[loss=0.1105, beats_loss=0.01315, ecapa_loss=0.0002381, whisper_loss=0.09499, over 21114.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.012, ecapa_loss=0.0002578, whisper_loss=0.09592, over 3925221.70 frames. ], batch size: 85, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:13:14,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=570270.0, ans=0.95 2024-08-10 13:13:17,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=570270.0, ans=0.125 2024-08-10 13:13:31,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=570370.0, ans=0.0 2024-08-10 13:13:43,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=570470.0, ans=0.125 2024-08-10 13:13:51,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=570570.0, ans=0.2 2024-08-10 13:13:59,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=570570.0, ans=0.0 2024-08-10 13:14:03,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=570670.0, ans=0.125 2024-08-10 13:14:16,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13600, loss[loss=0.1227, beats_loss=0.01369, ecapa_loss=0.0002286, whisper_loss=0.1068, over 22839.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0121, ecapa_loss=0.0002558, whisper_loss=0.09544, over 3928070.15 frames. ], batch size: 90, lr: 1.42e-02, grad_scale: 8589934592.0 2024-08-10 13:14:25,050 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 13:14:26,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=570770.0, ans=0.0 2024-08-10 13:14:33,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=570870.0, ans=0.0 2024-08-10 13:14:41,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=570870.0, ans=0.0 2024-08-10 13:14:44,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=570970.0, ans=0.125 2024-08-10 13:14:47,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.840e+01 3.248e+01 3.798e+01 4.801e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-10 13:14:48,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2024-08-10 13:14:54,342 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 13:15:00,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=571070.0, ans=0.2 2024-08-10 13:15:03,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=571070.0, ans=0.125 2024-08-10 13:15:06,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=571070.0, ans=0.125 2024-08-10 13:15:22,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13650, loss[loss=0.08765, beats_loss=0.01394, ecapa_loss=0.0002655, whisper_loss=0.07105, over 14786.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0122, ecapa_loss=0.0002568, whisper_loss=0.0946, over 3881849.98 frames. ], batch size: 64, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:15:38,438 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-10 13:15:39,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=571370.0, ans=0.125 2024-08-10 13:15:54,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=571470.0, ans=0.0 2024-08-10 13:16:00,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=571570.0, ans=0.125 2024-08-10 13:16:09,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=571570.0, ans=0.5 2024-08-10 13:16:30,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13700, loss[loss=0.1222, beats_loss=0.01262, ecapa_loss=0.000275, whisper_loss=0.1069, over 18906.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01214, ecapa_loss=0.0002565, whisper_loss=0.0957, over 3895882.74 frames. ], batch size: 76, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:16:40,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=10.0 2024-08-10 13:16:50,727 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 13:17:01,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.889e+01 3.237e+01 4.000e+01 5.503e+01, threshold=6.474e+01, percent-clipped=0.0 2024-08-10 13:17:01,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=571970.0, ans=0.0 2024-08-10 13:17:07,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=571970.0, ans=0.0 2024-08-10 13:17:09,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=8.0 2024-08-10 13:17:25,922 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 13:17:38,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13750, loss[loss=0.1165, beats_loss=0.01085, ecapa_loss=0.0002413, whisper_loss=0.1032, over 17638.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01208, ecapa_loss=0.000258, whisper_loss=0.09499, over 3864430.01 frames. ], batch size: 68, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:17:46,020 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-10 13:18:03,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2024-08-10 13:18:04,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=572470.0, ans=0.0 2024-08-10 13:18:32,611 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 13:18:36,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=572670.0, ans=10.0 2024-08-10 13:18:46,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13800, loss[loss=0.09539, beats_loss=0.01355, ecapa_loss=0.0002614, whisper_loss=0.07922, over 15007.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01204, ecapa_loss=0.0002569, whisper_loss=0.09478, over 3822609.22 frames. ], batch size: 61, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:19:14,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=572970.0, ans=0.125 2024-08-10 13:19:16,800 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 13:19:18,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.346e+01 2.990e+01 3.419e+01 4.092e+01 5.899e+01, threshold=6.838e+01, percent-clipped=0.0 2024-08-10 13:19:19,664 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 13:19:26,632 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 13:19:35,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2024-08-10 13:19:39,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2024-08-10 13:19:44,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573170.0, ans=0.125 2024-08-10 13:19:54,812 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13850, loss[loss=0.07868, beats_loss=0.01345, ecapa_loss=0.0002586, whisper_loss=0.06264, over 13423.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01205, ecapa_loss=0.0002554, whisper_loss=0.09514, over 3818276.72 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:20:02,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=573270.0, ans=0.0 2024-08-10 13:20:23,931 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 13:20:39,088 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.008e+05 2024-08-10 13:20:54,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=573670.0, ans=0.0 2024-08-10 13:20:59,498 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 13:21:02,775 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-10 13:21:03,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13900, loss[loss=0.1189, beats_loss=0.0127, ecapa_loss=0.0001809, whisper_loss=0.1044, over 18480.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01208, ecapa_loss=0.0002535, whisper_loss=0.0958, over 3838505.12 frames. ], batch size: 67, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:21:04,107 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 13:21:10,619 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 13:21:20,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=573870.0, ans=0.0 2024-08-10 13:21:33,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-08-10 13:21:35,186 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 3.046e+01 3.391e+01 3.778e+01 5.936e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 13:21:42,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=573970.0, ans=0.125 2024-08-10 13:22:13,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 13950, loss[loss=0.07968, beats_loss=0.01248, ecapa_loss=0.0002441, whisper_loss=0.06476, over 13747.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.0119, ecapa_loss=0.0002548, whisper_loss=0.09715, over 3861404.19 frames. ], batch size: 53, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:22:30,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=574370.0, ans=0.2 2024-08-10 13:22:49,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=574470.0, ans=0.0 2024-08-10 13:22:58,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=574570.0, ans=0.2 2024-08-10 13:23:00,568 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 13:23:07,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=574670.0, ans=0.2 2024-08-10 13:23:14,250 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 13:23:18,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=574670.0, ans=0.0 2024-08-10 13:23:20,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=574670.0, ans=0.125 2024-08-10 13:23:22,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14000, loss[loss=0.1266, beats_loss=0.01237, ecapa_loss=0.0002096, whisper_loss=0.1121, over 21895.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01198, ecapa_loss=0.0002532, whisper_loss=0.0968, over 3879384.31 frames. ], batch size: 85, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:23:26,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=574770.0, ans=0.125 2024-08-10 13:23:37,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574870.0, ans=0.1 2024-08-10 13:23:51,141 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 13:23:52,463 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 22 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-10 13:23:52,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=574970.0, ans=0.0 2024-08-10 13:23:55,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.914e+01 3.235e+01 3.866e+01 6.339e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-10 13:23:59,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=574970.0, ans=0.0 2024-08-10 13:24:03,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=575070.0, ans=0.2 2024-08-10 13:24:15,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=575070.0, ans=0.125 2024-08-10 13:24:18,753 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 13:24:25,721 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 13:24:29,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2024-08-10 13:24:34,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14050, loss[loss=0.1142, beats_loss=0.01071, ecapa_loss=0.0002952, whisper_loss=0.1006, over 21931.00 frames. ], tot_loss[loss=0.112, beats_loss=0.0119, ecapa_loss=0.000254, whisper_loss=0.09752, over 3893700.80 frames. ], batch size: 87, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:24:36,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2024-08-10 13:24:42,286 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 13:24:53,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=575370.0, ans=0.125 2024-08-10 13:24:59,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=575370.0, ans=0.125 2024-08-10 13:25:19,035 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 13:25:26,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=575570.0, ans=0.125 2024-08-10 13:25:26,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=575570.0, ans=0.1 2024-08-10 13:25:26,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=575570.0, ans=0.125 2024-08-10 13:25:33,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=575670.0, ans=0.0 2024-08-10 13:25:38,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=575670.0, ans=0.2 2024-08-10 13:25:38,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=575670.0, ans=0.0 2024-08-10 13:25:42,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=575670.0, ans=0.125 2024-08-10 13:25:43,521 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 13:25:47,582 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14100, loss[loss=0.08721, beats_loss=0.01123, ecapa_loss=0.000246, whisper_loss=0.07352, over 13982.00 frames. ], tot_loss[loss=0.1121, beats_loss=0.0119, ecapa_loss=0.0002528, whisper_loss=0.09763, over 3870340.41 frames. ], batch size: 54, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:26:00,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2024-08-10 13:26:11,544 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 13:26:21,407 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 13:26:25,063 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.359e+01 3.001e+01 3.648e+01 4.223e+01 8.641e+01, threshold=7.295e+01, percent-clipped=2.0 2024-08-10 13:26:49,315 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 13:26:59,013 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 13:27:08,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14150, loss[loss=0.1049, beats_loss=0.01297, ecapa_loss=0.0002616, whisper_loss=0.08931, over 22265.00 frames. ], tot_loss[loss=0.1118, beats_loss=0.01191, ecapa_loss=0.000253, whisper_loss=0.0974, over 3848573.58 frames. ], batch size: 93, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:27:09,869 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-10 13:27:14,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.29 vs. limit=15.0 2024-08-10 13:27:19,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=576270.0, ans=10.0 2024-08-10 13:27:31,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=576370.0, ans=0.0 2024-08-10 13:27:44,147 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 13:27:51,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=576470.0, ans=0.125 2024-08-10 13:28:09,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2024-08-10 13:28:23,434 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 13:28:32,916 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14200, loss[loss=0.12, beats_loss=0.01361, ecapa_loss=0.0001882, whisper_loss=0.1045, over 22912.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01191, ecapa_loss=0.000254, whisper_loss=0.09682, over 3869071.64 frames. ], batch size: 90, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:28:55,504 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 13:29:07,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=576870.0, ans=0.0 2024-08-10 13:29:17,155 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.473e+01 3.094e+01 3.432e+01 3.863e+01 7.530e+01, threshold=6.863e+01, percent-clipped=1.0 2024-08-10 13:29:20,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=576970.0, ans=0.125 2024-08-10 13:29:37,215 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 13:29:45,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2024-08-10 13:30:08,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14250, loss[loss=0.111, beats_loss=0.01032, ecapa_loss=0.0002807, whisper_loss=0.09791, over 19392.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01194, ecapa_loss=0.0002517, whisper_loss=0.09671, over 3884024.51 frames. ], batch size: 77, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:30:11,436 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 13:31:19,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577570.0, ans=0.1 2024-08-10 13:31:21,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=577570.0, ans=0.0 2024-08-10 13:31:37,868 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 13:31:38,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=577670.0, ans=0.0 2024-08-10 13:31:47,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=577670.0, ans=0.05 2024-08-10 13:31:56,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14300, loss[loss=0.09846, beats_loss=0.01264, ecapa_loss=0.0002208, whisper_loss=0.08361, over 17733.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01197, ecapa_loss=0.000252, whisper_loss=0.09619, over 3901739.44 frames. ], batch size: 71, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:32:13,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=577770.0, ans=0.09899494936611666 2024-08-10 13:32:22,371 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 13:32:42,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=577970.0, ans=0.0 2024-08-10 13:32:44,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.908e+01 3.226e+01 3.811e+01 6.354e+01, threshold=6.452e+01, percent-clipped=0.0 2024-08-10 13:32:45,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=577970.0, ans=0.0 2024-08-10 13:32:45,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577970.0, ans=0.1 2024-08-10 13:32:48,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=577970.0, ans=0.125 2024-08-10 13:32:56,863 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 13:32:57,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2024-08-10 13:33:03,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=578070.0, ans=0.125 2024-08-10 13:33:05,879 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 13:33:21,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-10 13:33:26,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=578170.0, ans=0.125 2024-08-10 13:33:29,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=578170.0, ans=0.2 2024-08-10 13:33:31,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=578170.0, ans=0.0 2024-08-10 13:33:42,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14350, loss[loss=0.08213, beats_loss=0.01149, ecapa_loss=0.0002902, whisper_loss=0.06773, over 15426.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01207, ecapa_loss=0.0002513, whisper_loss=0.09508, over 3900830.07 frames. ], batch size: 64, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:34:02,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-10 13:34:47,199 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 13:34:49,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578670.0, ans=0.1 2024-08-10 13:34:52,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14400, loss[loss=0.1203, beats_loss=0.01244, ecapa_loss=0.0002292, whisper_loss=0.1056, over 18813.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01207, ecapa_loss=0.0002522, whisper_loss=0.09563, over 3910988.35 frames. ], batch size: 74, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:35:05,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=578870.0, ans=0.5 2024-08-10 13:35:18,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=578870.0, ans=0.0 2024-08-10 13:35:21,152 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 13:35:22,486 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 13:35:23,891 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 13:35:24,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+01 3.244e+01 3.522e+01 4.448e+01 1.287e+02, threshold=7.043e+01, percent-clipped=5.0 2024-08-10 13:35:30,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=578970.0, ans=0.0 2024-08-10 13:35:39,079 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 18 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 13:35:43,573 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 13:35:51,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=579170.0, ans=0.2 2024-08-10 13:35:55,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=579170.0, ans=0.1 2024-08-10 13:35:56,662 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 13:36:02,143 INFO [train_multi_KD3.py:1116] (0/4) Epoch 4, batch 14450, loss[loss=0.1088, beats_loss=0.01333, ecapa_loss=0.0002491, whisper_loss=0.09295, over 19794.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01195, ecapa_loss=0.0002555, whisper_loss=0.09622, over 3913359.24 frames. ], batch size: 80, lr: 1.41e-02, grad_scale: 8589934592.0 2024-08-10 13:36:02,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579270.0, ans=0.1 2024-08-10 13:36:12,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=579270.0, ans=0.0 2024-08-10 13:36:17,508 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 13:36:21,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=579370.0, ans=0.125 2024-08-10 13:36:57,432 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-10 13:37:04,500 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-4.pt 2024-08-10 13:37:44,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 0, loss[loss=0.1041, beats_loss=0.01231, ecapa_loss=0.0003027, whisper_loss=0.08874, over 21742.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01231, ecapa_loss=0.0003027, whisper_loss=0.08874, over 21742.00 frames. ], batch size: 91, lr: 1.31e-02, grad_scale: 8589934592.0 2024-08-10 13:37:44,860 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 13:38:27,645 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007699, whisper_loss=0.2545, over 922467.00 frames. 2024-08-10 13:38:42,894 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on SV_voxceleb1: loss=0.006763, beats_loss=0, ecapa_loss=0.0006763, whisper_loss=0, over 939242.00 frames. 2024-08-10 13:40:39,866 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on AT_audioset: loss=0.02719, beats_loss=0.02719, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 13:40:39,870 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 13:41:14,881 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 13:41:17,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=579820.0, ans=0.2 2024-08-10 13:41:25,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=579820.0, ans=0.2 2024-08-10 13:41:27,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579820.0, ans=0.125 2024-08-10 13:41:35,656 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 13:41:53,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 3.049e+01 3.546e+01 4.164e+01 6.478e+01, threshold=7.092e+01, percent-clipped=0.0 2024-08-10 13:42:39,246 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-10 13:42:46,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 50, loss[loss=0.1029, beats_loss=0.01124, ecapa_loss=0.0003165, whisper_loss=0.08853, over 18523.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01233, ecapa_loss=0.0002597, whisper_loss=0.09213, over 866942.43 frames. ], batch size: 77, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:44:01,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=580520.0, ans=0.0 2024-08-10 13:44:04,074 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:44:08,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=580520.0, ans=0.2 2024-08-10 13:44:28,656 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.523e-01 2024-08-10 13:44:41,932 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 100, loss[loss=0.09808, beats_loss=0.0107, ecapa_loss=0.0002341, whisper_loss=0.08504, over 23673.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01178, ecapa_loss=0.0002598, whisper_loss=0.094, over 1518320.06 frames. ], batch size: 93, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:44:58,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=580720.0, ans=0.125 2024-08-10 13:45:00,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.78 vs. limit=22.5 2024-08-10 13:45:28,169 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-10 13:45:41,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=580920.0, ans=0.0 2024-08-10 13:45:43,219 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+01 3.227e+01 3.615e+01 4.275e+01 6.139e+01, threshold=7.229e+01, percent-clipped=0.0 2024-08-10 13:45:54,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=581020.0, ans=0.0 2024-08-10 13:46:11,961 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 13:46:23,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581120.0, ans=0.1 2024-08-10 13:46:27,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 150, loss[loss=0.1232, beats_loss=0.01102, ecapa_loss=0.0002491, whisper_loss=0.1097, over 22445.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01177, ecapa_loss=0.0002535, whisper_loss=0.09359, over 2060331.96 frames. ], batch size: 89, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:46:32,949 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 13:46:52,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=581320.0, ans=0.125 2024-08-10 13:46:56,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=581320.0, ans=0.125 2024-08-10 13:47:35,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=581620.0, ans=0.125 2024-08-10 13:47:37,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=581620.0, ans=0.125 2024-08-10 13:47:46,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 200, loss[loss=0.102, beats_loss=0.01335, ecapa_loss=0.0002891, whisper_loss=0.08579, over 21601.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01186, ecapa_loss=0.0002535, whisper_loss=0.09379, over 2421963.83 frames. ], batch size: 91, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:47:52,599 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 13:47:55,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=581720.0, ans=0.125 2024-08-10 13:48:02,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=581820.0, ans=0.2 2024-08-10 13:48:13,279 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 13:48:20,504 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 13:48:20,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=581920.0, ans=0.0 2024-08-10 13:48:28,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 3.079e+01 3.499e+01 4.044e+01 6.352e+01, threshold=6.999e+01, percent-clipped=0.0 2024-08-10 13:48:37,717 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-10 13:48:53,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-08-10 13:48:56,577 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 13:48:57,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=22.5 2024-08-10 13:49:01,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 250, loss[loss=0.08871, beats_loss=0.01583, ecapa_loss=0.0001671, whisper_loss=0.07121, over 17356.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01183, ecapa_loss=0.0002508, whisper_loss=0.09405, over 2728602.13 frames. ], batch size: 68, lr: 1.31e-02, grad_scale: 17179869184.0 2024-08-10 13:49:34,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=582420.0, ans=0.125 2024-08-10 13:49:39,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=582420.0, ans=0.0 2024-08-10 13:49:45,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-08-10 13:50:11,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=582620.0, ans=0.125 2024-08-10 13:50:16,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 300, loss[loss=0.1227, beats_loss=0.01124, ecapa_loss=0.0002564, whisper_loss=0.1089, over 17261.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01173, ecapa_loss=0.0002507, whisper_loss=0.09534, over 2976733.36 frames. ], batch size: 66, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:50:39,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=582820.0, ans=0.125 2024-08-10 13:50:45,103 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 13:50:58,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.973e+01 3.374e+01 4.127e+01 8.161e+01, threshold=6.749e+01, percent-clipped=1.0 2024-08-10 13:51:00,178 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 13:51:06,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583020.0, ans=0.1 2024-08-10 13:51:13,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=583020.0, ans=0.0 2024-08-10 13:51:19,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=583120.0, ans=0.0 2024-08-10 13:51:21,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=583120.0, ans=0.125 2024-08-10 13:51:23,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=583120.0, ans=0.5 2024-08-10 13:51:24,235 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 13:51:29,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=583220.0, ans=0.125 2024-08-10 13:51:30,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 350, loss[loss=0.1007, beats_loss=0.0114, ecapa_loss=0.0002264, whisper_loss=0.08708, over 15526.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01166, ecapa_loss=0.0002494, whisper_loss=0.09484, over 3143755.63 frames. ], batch size: 61, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:51:31,592 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-10 13:51:33,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=583220.0, ans=0.125 2024-08-10 13:51:47,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=583320.0, ans=0.125 2024-08-10 13:51:53,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=583320.0, ans=0.1 2024-08-10 13:51:54,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=583320.0, ans=0.0 2024-08-10 13:51:55,681 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 13:52:10,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=583420.0, ans=0.05 2024-08-10 13:52:11,007 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 13:52:18,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=583520.0, ans=0.125 2024-08-10 13:52:43,044 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 400, loss[loss=0.1086, beats_loss=0.01265, ecapa_loss=0.0002632, whisper_loss=0.09334, over 15139.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01157, ecapa_loss=0.000249, whisper_loss=0.09512, over 3289152.79 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:52:44,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=583720.0, ans=0.2 2024-08-10 13:52:51,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=583720.0, ans=0.125 2024-08-10 13:52:52,451 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 13:53:17,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2024-08-10 13:53:25,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.918e+01 3.260e+01 3.754e+01 7.890e+01, threshold=6.521e+01, percent-clipped=1.0 2024-08-10 13:53:47,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2024-08-10 13:53:51,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=584120.0, ans=0.0 2024-08-10 13:53:53,253 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 13:53:53,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=584120.0, ans=0.09899494936611666 2024-08-10 13:53:55,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=584120.0, ans=0.0 2024-08-10 13:53:56,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=584120.0, ans=0.125 2024-08-10 13:54:00,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 450, loss[loss=0.1204, beats_loss=0.01108, ecapa_loss=0.0002563, whisper_loss=0.1068, over 23811.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01161, ecapa_loss=0.0002483, whisper_loss=0.09492, over 3404447.04 frames. ], batch size: 93, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:54:02,133 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 13:54:15,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2024-08-10 13:54:23,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=15.0 2024-08-10 13:54:26,569 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 13:54:26,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=584320.0, ans=0.09899494936611666 2024-08-10 13:54:32,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=584420.0, ans=0.0 2024-08-10 13:55:01,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=584620.0, ans=0.0 2024-08-10 13:55:02,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=584620.0, ans=0.2 2024-08-10 13:55:09,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=584620.0, ans=0.2 2024-08-10 13:55:12,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 500, loss[loss=0.1032, beats_loss=0.01255, ecapa_loss=0.0002388, whisper_loss=0.08822, over 22813.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01159, ecapa_loss=0.0002469, whisper_loss=0.09478, over 3501120.09 frames. ], batch size: 91, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:55:18,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=584720.0, ans=0.2 2024-08-10 13:55:26,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=584820.0, ans=0.125 2024-08-10 13:55:32,433 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 13:55:36,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2024-08-10 13:55:37,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.30 vs. limit=22.5 2024-08-10 13:55:40,776 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 13:55:48,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=584920.0, ans=0.0 2024-08-10 13:55:52,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.793e+01 3.161e+01 3.607e+01 7.948e+01, threshold=6.322e+01, percent-clipped=1.0 2024-08-10 13:55:55,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=12.0 2024-08-10 13:56:05,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=585020.0, ans=0.0 2024-08-10 13:56:17,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-08-10 13:56:24,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 550, loss[loss=0.09306, beats_loss=0.0121, ecapa_loss=0.0002689, whisper_loss=0.07827, over 14729.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01165, ecapa_loss=0.0002456, whisper_loss=0.09496, over 3572762.15 frames. ], batch size: 62, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:56:25,917 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:56:31,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=585220.0, ans=0.0 2024-08-10 13:56:42,618 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 13:56:45,332 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 13:56:46,818 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 13:57:03,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2024-08-10 13:57:09,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-10 13:57:13,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=585520.0, ans=0.2 2024-08-10 13:57:25,973 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 13:57:34,488 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-10 13:57:34,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585720.0, ans=0.1 2024-08-10 13:57:36,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 600, loss[loss=0.08348, beats_loss=0.01234, ecapa_loss=0.0002927, whisper_loss=0.06821, over 15359.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01172, ecapa_loss=0.0002429, whisper_loss=0.09456, over 3618325.91 frames. ], batch size: 67, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:57:50,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=585820.0, ans=0.1 2024-08-10 13:57:58,469 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-10 13:58:01,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=585820.0, ans=0.2 2024-08-10 13:58:07,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=585920.0, ans=0.0 2024-08-10 13:58:09,923 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 13:58:16,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.818e+01 3.113e+01 3.779e+01 5.763e+01, threshold=6.225e+01, percent-clipped=0.0 2024-08-10 13:58:22,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=586020.0, ans=0.0 2024-08-10 13:58:23,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 13:58:29,621 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 13:58:43,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=586120.0, ans=10.0 2024-08-10 13:58:43,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-10 13:58:46,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=586120.0, ans=10.0 2024-08-10 13:58:48,512 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 650, loss[loss=0.09144, beats_loss=0.0139, ecapa_loss=0.0002384, whisper_loss=0.07516, over 22011.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01172, ecapa_loss=0.0002419, whisper_loss=0.09438, over 3662028.52 frames. ], batch size: 89, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 13:58:59,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-08-10 13:59:07,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=586320.0, ans=0.0 2024-08-10 13:59:11,260 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 13:59:16,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=22.5 2024-08-10 13:59:26,682 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 13:59:28,222 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-10 13:59:35,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=586520.0, ans=0.0 2024-08-10 13:59:36,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2024-08-10 13:59:58,195 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 700, loss[loss=0.103, beats_loss=0.01098, ecapa_loss=0.0002962, whisper_loss=0.08904, over 14417.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01171, ecapa_loss=0.0002424, whisper_loss=0.09473, over 3679665.24 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:00:01,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=586720.0, ans=0.125 2024-08-10 14:00:07,677 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-10 14:00:12,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-08-10 14:00:15,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-10 14:00:21,157 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 14:00:25,439 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-10 14:00:38,599 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.878e+01 3.235e+01 3.847e+01 7.521e+01, threshold=6.470e+01, percent-clipped=2.0 2024-08-10 14:00:41,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=587020.0, ans=0.0 2024-08-10 14:00:53,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=587020.0, ans=0.125 2024-08-10 14:00:57,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=587120.0, ans=0.1 2024-08-10 14:01:01,096 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 14:01:07,265 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 14:01:11,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 750, loss[loss=0.1251, beats_loss=0.01196, ecapa_loss=0.0002371, whisper_loss=0.1107, over 22120.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01175, ecapa_loss=0.0002416, whisper_loss=0.09465, over 3705521.61 frames. ], batch size: 88, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:01:23,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=587220.0, ans=0.125 2024-08-10 14:01:36,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-10 14:01:37,054 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-10 14:01:38,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=587420.0, ans=0.125 2024-08-10 14:01:40,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=587420.0, ans=0.2 2024-08-10 14:01:41,462 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-10 14:01:43,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587420.0, ans=0.1 2024-08-10 14:01:54,209 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:02:02,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=587520.0, ans=0.125 2024-08-10 14:02:12,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=587620.0, ans=0.125 2024-08-10 14:02:20,514 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 14:02:21,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 800, loss[loss=0.1071, beats_loss=0.01312, ecapa_loss=0.000211, whisper_loss=0.09188, over 19315.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01177, ecapa_loss=0.0002391, whisper_loss=0.09415, over 3687029.49 frames. ], batch size: 76, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:02:32,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=587720.0, ans=0.1 2024-08-10 14:02:34,521 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-10 14:03:01,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.857e+01 3.282e+01 4.072e+01 6.223e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 14:03:01,568 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-10 14:03:06,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2024-08-10 14:03:07,567 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 14:03:13,309 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 14:03:19,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=588120.0, ans=0.125 2024-08-10 14:03:33,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 850, loss[loss=0.1212, beats_loss=0.009584, ecapa_loss=0.000226, whisper_loss=0.1094, over 23981.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01184, ecapa_loss=0.0002376, whisper_loss=0.09384, over 3734468.05 frames. ], batch size: 91, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:03:34,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=588220.0, ans=0.125 2024-08-10 14:03:38,376 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 14:03:38,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=12.0 2024-08-10 14:04:09,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588420.0, ans=0.0 2024-08-10 14:04:36,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=588620.0, ans=0.0 2024-08-10 14:04:40,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=588620.0, ans=0.1 2024-08-10 14:04:50,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 900, loss[loss=0.09604, beats_loss=0.01369, ecapa_loss=0.0002054, whisper_loss=0.0803, over 19145.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01185, ecapa_loss=0.0002372, whisper_loss=0.09355, over 3754940.68 frames. ], batch size: 75, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:05:01,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=588720.0, ans=0.0 2024-08-10 14:05:10,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=588820.0, ans=0.025 2024-08-10 14:05:18,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=588820.0, ans=0.025 2024-08-10 14:05:25,440 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 14:05:32,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.732e+01 3.108e+01 3.625e+01 6.653e+01, threshold=6.216e+01, percent-clipped=1.0 2024-08-10 14:05:36,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=589020.0, ans=0.0 2024-08-10 14:06:05,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 950, loss[loss=0.09541, beats_loss=0.01392, ecapa_loss=0.0002284, whisper_loss=0.07921, over 21413.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01194, ecapa_loss=0.0002348, whisper_loss=0.09345, over 3797703.17 frames. ], batch size: 90, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:06:44,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=589420.0, ans=0.0 2024-08-10 14:06:45,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=589420.0, ans=0.2 2024-08-10 14:07:07,004 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 14:07:15,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589620.0, ans=0.1 2024-08-10 14:07:18,420 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 14:07:19,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.18 vs. limit=15.0 2024-08-10 14:07:21,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1000, loss[loss=0.1177, beats_loss=0.01081, ecapa_loss=0.0002311, whisper_loss=0.1045, over 16419.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01193, ecapa_loss=0.0002334, whisper_loss=0.09307, over 3773485.41 frames. ], batch size: 62, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:07:34,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2024-08-10 14:07:44,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=589820.0, ans=0.0 2024-08-10 14:08:04,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+01 2.773e+01 3.202e+01 3.484e+01 8.284e+01, threshold=6.403e+01, percent-clipped=2.0 2024-08-10 14:08:16,596 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-10 14:08:33,581 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 14:08:37,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1050, loss[loss=0.09141, beats_loss=0.01218, ecapa_loss=0.0002087, whisper_loss=0.07715, over 20748.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01185, ecapa_loss=0.0002341, whisper_loss=0.09396, over 3758156.64 frames. ], batch size: 80, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:08:44,179 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-10 14:09:17,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=590420.0, ans=22.5 2024-08-10 14:09:55,096 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1100, loss[loss=0.118, beats_loss=0.01231, ecapa_loss=0.0002345, whisper_loss=0.1033, over 22605.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01183, ecapa_loss=0.0002337, whisper_loss=0.09465, over 3789770.23 frames. ], batch size: 89, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:10:12,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=590820.0, ans=0.0 2024-08-10 14:10:12,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=590820.0, ans=0.0 2024-08-10 14:10:20,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-10 14:10:27,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=590920.0, ans=0.125 2024-08-10 14:10:31,947 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 14:10:42,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.893e+01 3.257e+01 3.748e+01 6.503e+01, threshold=6.515e+01, percent-clipped=1.0 2024-08-10 14:10:43,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2024-08-10 14:11:15,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1150, loss[loss=0.1391, beats_loss=0.009193, ecapa_loss=0.0002247, whisper_loss=0.1276, over 24596.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01177, ecapa_loss=0.0002355, whisper_loss=0.09487, over 3806141.52 frames. ], batch size: 92, lr: 1.30e-02, grad_scale: 17179869184.0 2024-08-10 14:11:18,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=591220.0, ans=0.125 2024-08-10 14:11:22,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=591220.0, ans=0.2 2024-08-10 14:11:27,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=591220.0, ans=0.125 2024-08-10 14:11:42,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=591320.0, ans=0.125 2024-08-10 14:11:47,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=591420.0, ans=0.125 2024-08-10 14:11:48,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=591420.0, ans=0.125 2024-08-10 14:11:51,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-08-10 14:11:57,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=591520.0, ans=0.0 2024-08-10 14:12:03,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591520.0, ans=0.125 2024-08-10 14:12:05,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-08-10 14:12:10,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2024-08-10 14:12:11,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=591520.0, ans=0.0 2024-08-10 14:12:29,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1200, loss[loss=0.108, beats_loss=0.01223, ecapa_loss=0.0002317, whisper_loss=0.09346, over 19402.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01173, ecapa_loss=0.0002363, whisper_loss=0.09436, over 3780792.92 frames. ], batch size: 78, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:12:43,667 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-10 14:12:51,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-08-10 14:12:56,932 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 14:13:00,181 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 14:13:07,516 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-10 14:13:09,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=591920.0, ans=0.2 2024-08-10 14:13:12,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+01 2.894e+01 3.360e+01 3.999e+01 6.251e+01, threshold=6.719e+01, percent-clipped=0.0 2024-08-10 14:13:15,790 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 14:13:37,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=592120.0, ans=0.0 2024-08-10 14:13:41,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592120.0, ans=0.1 2024-08-10 14:13:46,382 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1250, loss[loss=0.1548, beats_loss=0.009894, ecapa_loss=0.0002213, whisper_loss=0.1427, over 20617.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01169, ecapa_loss=0.0002353, whisper_loss=0.09505, over 3764913.24 frames. ], batch size: 75, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:13:56,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=592220.0, ans=0.1 2024-08-10 14:14:00,090 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 14:14:08,118 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 14:14:18,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=12.0 2024-08-10 14:14:36,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=592520.0, ans=0.125 2024-08-10 14:14:39,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=592520.0, ans=0.125 2024-08-10 14:14:59,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=592620.0, ans=0.07 2024-08-10 14:15:03,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1300, loss[loss=0.104, beats_loss=0.01305, ecapa_loss=0.0001933, whisper_loss=0.08898, over 18882.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0118, ecapa_loss=0.0002349, whisper_loss=0.09425, over 3793288.19 frames. ], batch size: 73, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:15:40,982 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 14:15:42,463 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.129e-01 2024-08-10 14:15:50,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.731e+01 3.070e+01 3.519e+01 6.243e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 14:15:58,044 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-10 14:16:05,375 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 14:16:10,013 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-10 14:16:22,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=593220.0, ans=0.125 2024-08-10 14:16:24,055 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1350, loss[loss=0.1152, beats_loss=0.01155, ecapa_loss=0.000242, whisper_loss=0.1012, over 22765.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01188, ecapa_loss=0.0002341, whisper_loss=0.09412, over 3815284.86 frames. ], batch size: 91, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:16:34,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=593220.0, ans=0.125 2024-08-10 14:16:36,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=593220.0, ans=0.2 2024-08-10 14:17:00,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2024-08-10 14:17:24,545 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 14:17:41,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-10 14:17:44,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1400, loss[loss=0.1138, beats_loss=0.01053, ecapa_loss=0.0001874, whisper_loss=0.1014, over 16617.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01189, ecapa_loss=0.0002323, whisper_loss=0.09361, over 3826187.99 frames. ], batch size: 63, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:17:46,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=593720.0, ans=0.125 2024-08-10 14:17:52,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=593720.0, ans=0.2 2024-08-10 14:17:53,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.29 vs. limit=15.0 2024-08-10 14:17:57,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593720.0, ans=0.1 2024-08-10 14:17:59,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=593820.0, ans=0.125 2024-08-10 14:18:15,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=593920.0, ans=0.1 2024-08-10 14:18:18,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=593920.0, ans=0.125 2024-08-10 14:18:18,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593920.0, ans=0.1 2024-08-10 14:18:25,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.747e+01 3.189e+01 3.732e+01 5.782e+01, threshold=6.377e+01, percent-clipped=0.0 2024-08-10 14:18:25,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=593920.0, ans=0.0 2024-08-10 14:18:27,093 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.562e+03 2024-08-10 14:18:31,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2024-08-10 14:18:45,321 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 14:18:56,611 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1450, loss[loss=0.07198, beats_loss=0.01435, ecapa_loss=0.000228, whisper_loss=0.05535, over 15975.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01188, ecapa_loss=0.0002307, whisper_loss=0.09333, over 3784137.22 frames. ], batch size: 66, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:19:45,557 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 14:19:49,048 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 14:19:53,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-08-10 14:19:53,983 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 14:20:15,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594520.0, ans=0.1 2024-08-10 14:20:18,815 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-10 14:20:29,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=594620.0, ans=0.125 2024-08-10 14:20:35,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594620.0, ans=0.1 2024-08-10 14:20:36,385 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 14:20:40,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1500, loss[loss=0.07084, beats_loss=0.01068, ecapa_loss=0.00017, whisper_loss=0.05846, over 15886.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01193, ecapa_loss=0.0002306, whisper_loss=0.09305, over 3806652.61 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:20:47,309 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 14:20:47,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2024-08-10 14:21:00,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.59 vs. limit=15.0 2024-08-10 14:21:13,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=594920.0, ans=0.2 2024-08-10 14:21:24,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.686e+01 3.011e+01 3.504e+01 1.040e+02, threshold=6.023e+01, percent-clipped=2.0 2024-08-10 14:21:36,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=595020.0, ans=0.0 2024-08-10 14:21:42,508 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 14:21:52,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=595120.0, ans=0.2 2024-08-10 14:21:58,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=595220.0, ans=0.125 2024-08-10 14:21:59,056 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1550, loss[loss=0.1096, beats_loss=0.01043, ecapa_loss=0.0002368, whisper_loss=0.09678, over 16910.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01183, ecapa_loss=0.0002296, whisper_loss=0.09391, over 3797355.79 frames. ], batch size: 67, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:22:05,268 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-10 14:22:07,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=595220.0, ans=0.125 2024-08-10 14:22:08,231 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 14:22:19,756 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 14:22:23,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=595320.0, ans=0.0 2024-08-10 14:22:27,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2024-08-10 14:22:29,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595420.0, ans=0.1 2024-08-10 14:22:46,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.32 vs. limit=22.5 2024-08-10 14:22:56,869 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 14:23:15,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1600, loss[loss=0.1405, beats_loss=0.007304, ecapa_loss=0.0002196, whisper_loss=0.131, over 17671.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01192, ecapa_loss=0.0002286, whisper_loss=0.09351, over 3822547.81 frames. ], batch size: 63, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:23:29,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=595720.0, ans=0.2 2024-08-10 14:23:36,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=595820.0, ans=0.2 2024-08-10 14:23:42,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=595820.0, ans=0.0 2024-08-10 14:23:44,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=595820.0, ans=0.125 2024-08-10 14:23:48,922 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 14:23:59,561 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.802e+01 3.147e+01 3.611e+01 5.289e+01, threshold=6.294e+01, percent-clipped=0.0 2024-08-10 14:24:09,821 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 13 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 14:24:29,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596120.0, ans=0.1 2024-08-10 14:24:30,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=596120.0, ans=0.125 2024-08-10 14:24:37,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1650, loss[loss=0.1275, beats_loss=0.008572, ecapa_loss=0.0002659, whisper_loss=0.1162, over 16404.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01194, ecapa_loss=0.0002287, whisper_loss=0.09403, over 3825185.99 frames. ], batch size: 64, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:24:54,567 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 14:24:57,275 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 14:25:13,514 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 14:25:22,855 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.659e-02 2024-08-10 14:25:32,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=596520.0, ans=0.2 2024-08-10 14:25:37,509 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 14:25:37,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596620.0, ans=0.1 2024-08-10 14:25:42,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=596620.0, ans=0.2 2024-08-10 14:25:43,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2024-08-10 14:25:47,093 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 14:25:48,399 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 14:25:52,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1700, loss[loss=0.09221, beats_loss=0.01135, ecapa_loss=0.0002698, whisper_loss=0.07816, over 23301.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01192, ecapa_loss=0.0002277, whisper_loss=0.0943, over 3815458.53 frames. ], batch size: 97, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:25:53,091 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 14:25:59,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=596720.0, ans=0.125 2024-08-10 14:26:02,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=596720.0, ans=0.125 2024-08-10 14:26:25,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=596920.0, ans=0.1 2024-08-10 14:26:34,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+01 2.744e+01 3.070e+01 3.564e+01 5.631e+01, threshold=6.139e+01, percent-clipped=0.0 2024-08-10 14:26:36,804 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 14:26:58,118 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.003e+01 2024-08-10 14:27:04,386 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 14:27:08,787 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1750, loss[loss=0.09282, beats_loss=0.01284, ecapa_loss=0.0002712, whisper_loss=0.07727, over 22197.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01184, ecapa_loss=0.0002294, whisper_loss=0.09424, over 3821681.33 frames. ], batch size: 92, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:27:14,529 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-10 14:27:17,046 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 14:27:29,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=597320.0, ans=0.125 2024-08-10 14:27:29,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-08-10 14:27:38,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=597320.0, ans=0.125 2024-08-10 14:27:38,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=597320.0, ans=0.05 2024-08-10 14:27:50,816 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-10 14:28:01,822 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 14:28:04,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=597520.0, ans=0.125 2024-08-10 14:28:07,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=597520.0, ans=0.125 2024-08-10 14:28:21,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=597620.0, ans=0.125 2024-08-10 14:28:21,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.06 vs. limit=15.0 2024-08-10 14:28:21,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2024-08-10 14:28:25,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=597720.0, ans=0.125 2024-08-10 14:28:26,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1800, loss[loss=0.1166, beats_loss=0.01317, ecapa_loss=0.000234, whisper_loss=0.1011, over 23010.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01179, ecapa_loss=0.0002296, whisper_loss=0.09435, over 3795292.23 frames. ], batch size: 92, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:28:29,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=597720.0, ans=0.125 2024-08-10 14:28:40,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-10 14:28:50,384 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-10 14:29:07,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.774e+01 3.082e+01 3.729e+01 4.718e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-10 14:29:22,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=12.0 2024-08-10 14:29:40,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1850, loss[loss=0.1245, beats_loss=0.01037, ecapa_loss=0.0002586, whisper_loss=0.1115, over 22498.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01168, ecapa_loss=0.0002333, whisper_loss=0.09602, over 3816451.64 frames. ], batch size: 88, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:29:42,494 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 14:29:47,911 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 14:30:12,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=598420.0, ans=0.0 2024-08-10 14:30:14,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-10 14:30:15,887 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-10 14:30:17,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=598420.0, ans=0.125 2024-08-10 14:30:28,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=598520.0, ans=0.5 2024-08-10 14:30:36,350 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 14:30:36,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=598520.0, ans=0.125 2024-08-10 14:30:47,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=598620.0, ans=0.2 2024-08-10 14:30:52,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.21 vs. limit=22.5 2024-08-10 14:30:57,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1900, loss[loss=0.1252, beats_loss=0.01038, ecapa_loss=0.0002699, whisper_loss=0.1121, over 16784.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.0117, ecapa_loss=0.0002377, whisper_loss=0.09604, over 3801712.98 frames. ], batch size: 66, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:30:57,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=598720.0, ans=0.0 2024-08-10 14:31:16,352 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 14:31:16,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-10 14:31:41,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.853e+01 3.252e+01 3.827e+01 6.548e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-10 14:31:49,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=599020.0, ans=0.05 2024-08-10 14:31:53,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=599020.0, ans=0.5 2024-08-10 14:31:58,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=599120.0, ans=0.0 2024-08-10 14:32:04,340 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 14:32:05,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=599120.0, ans=0.0 2024-08-10 14:32:07,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=15.0 2024-08-10 14:32:14,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 1950, loss[loss=0.145, beats_loss=0.008897, ecapa_loss=0.0002438, whisper_loss=0.1337, over 22555.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01162, ecapa_loss=0.0002414, whisper_loss=0.09649, over 3812910.96 frames. ], batch size: 83, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:32:31,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=599320.0, ans=0.2 2024-08-10 14:32:37,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-10 14:32:50,087 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 14:32:54,619 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 14:33:08,590 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 14:33:18,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=599620.0, ans=0.2 2024-08-10 14:33:30,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2000, loss[loss=0.1123, beats_loss=0.008578, ecapa_loss=0.0002853, whisper_loss=0.1008, over 16029.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01163, ecapa_loss=0.0002437, whisper_loss=0.09694, over 3824018.62 frames. ], batch size: 61, lr: 1.29e-02, grad_scale: 17179869184.0 2024-08-10 14:33:41,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=599720.0, ans=0.125 2024-08-10 14:33:45,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=599820.0, ans=0.125 2024-08-10 14:33:47,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=599820.0, ans=0.0 2024-08-10 14:33:47,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=599820.0, ans=0.1 2024-08-10 14:33:49,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=599820.0, ans=0.07 2024-08-10 14:34:01,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2024-08-10 14:34:12,328 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-60000.pt 2024-08-10 14:34:16,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+01 2.787e+01 3.156e+01 3.560e+01 5.120e+01, threshold=6.313e+01, percent-clipped=0.0 2024-08-10 14:34:19,005 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 14:34:20,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=600020.0, ans=0.2 2024-08-10 14:34:26,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=600020.0, ans=0.125 2024-08-10 14:34:31,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=600020.0, ans=0.125 2024-08-10 14:34:42,986 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 14:34:50,047 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2050, loss[loss=0.09658, beats_loss=0.01299, ecapa_loss=0.0002275, whisper_loss=0.08132, over 22557.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01171, ecapa_loss=0.0002414, whisper_loss=0.09657, over 3816517.33 frames. ], batch size: 88, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:35:01,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-08-10 14:35:03,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=600320.0, ans=0.0 2024-08-10 14:35:03,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=600320.0, ans=0.0 2024-08-10 14:35:17,853 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 14:35:18,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600320.0, ans=0.1 2024-08-10 14:35:18,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=600320.0, ans=0.2 2024-08-10 14:35:29,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=600420.0, ans=0.125 2024-08-10 14:35:30,073 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 14:35:35,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=600520.0, ans=0.0 2024-08-10 14:35:37,761 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 14:35:47,508 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 14:36:04,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2100, loss[loss=0.1049, beats_loss=0.01266, ecapa_loss=0.0002014, whisper_loss=0.09023, over 18988.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.0118, ecapa_loss=0.0002422, whisper_loss=0.09554, over 3820427.10 frames. ], batch size: 73, lr: 1.29e-02, grad_scale: 34359738368.0 2024-08-10 14:36:08,422 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 14:36:12,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.02 vs. limit=22.5 2024-08-10 14:36:23,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=600820.0, ans=0.125 2024-08-10 14:36:46,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.750e+01 3.110e+01 3.646e+01 5.998e+01, threshold=6.220e+01, percent-clipped=0.0 2024-08-10 14:36:48,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=601020.0, ans=0.0 2024-08-10 14:36:57,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=601020.0, ans=0.125 2024-08-10 14:37:00,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=601020.0, ans=0.125 2024-08-10 14:37:07,729 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 14:37:19,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2150, loss[loss=0.1223, beats_loss=0.009348, ecapa_loss=0.0002742, whisper_loss=0.1102, over 21389.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01187, ecapa_loss=0.0002426, whisper_loss=0.09558, over 3834287.52 frames. ], batch size: 81, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:37:24,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601220.0, ans=0.1 2024-08-10 14:37:36,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=601320.0, ans=0.125 2024-08-10 14:37:39,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=601320.0, ans=0.125 2024-08-10 14:37:43,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=601320.0, ans=0.2 2024-08-10 14:37:50,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=601420.0, ans=0.0 2024-08-10 14:37:52,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=601420.0, ans=0.125 2024-08-10 14:37:57,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=601420.0, ans=0.125 2024-08-10 14:38:15,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=601520.0, ans=0.125 2024-08-10 14:38:35,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2200, loss[loss=0.1175, beats_loss=0.01158, ecapa_loss=0.0002489, whisper_loss=0.1034, over 13770.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01191, ecapa_loss=0.0002427, whisper_loss=0.09617, over 3845150.87 frames. ], batch size: 56, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:38:37,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=601720.0, ans=0.0 2024-08-10 14:38:40,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=601720.0, ans=0.5 2024-08-10 14:38:59,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=601820.0, ans=0.2 2024-08-10 14:39:08,138 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 14:39:11,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-08-10 14:39:14,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.850e+01 3.154e+01 3.768e+01 5.598e+01, threshold=6.309e+01, percent-clipped=0.0 2024-08-10 14:39:15,924 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 14:39:31,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.03 vs. limit=10.0 2024-08-10 14:39:34,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=602120.0, ans=0.125 2024-08-10 14:39:35,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=602120.0, ans=0.0 2024-08-10 14:39:39,158 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 14:39:41,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.39 vs. limit=10.0 2024-08-10 14:39:42,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2250, loss[loss=0.1246, beats_loss=0.01106, ecapa_loss=0.0002634, whisper_loss=0.1109, over 17389.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01201, ecapa_loss=0.0002419, whisper_loss=0.09539, over 3834524.23 frames. ], batch size: 68, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:39:58,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=602320.0, ans=0.0 2024-08-10 14:40:06,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=602320.0, ans=0.125 2024-08-10 14:40:11,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=602420.0, ans=0.2 2024-08-10 14:40:17,719 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 30 from Vox, 23 fro AS 2024-08-10 14:40:47,068 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2300, loss[loss=0.1284, beats_loss=0.009625, ecapa_loss=0.000283, whisper_loss=0.1159, over 22373.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01196, ecapa_loss=0.0002426, whisper_loss=0.09561, over 3845013.02 frames. ], batch size: 88, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:40:52,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=602720.0, ans=0.125 2024-08-10 14:41:00,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=602820.0, ans=0.125 2024-08-10 14:41:05,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=602820.0, ans=22.5 2024-08-10 14:41:06,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=602820.0, ans=0.0 2024-08-10 14:41:15,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=602920.0, ans=0.0 2024-08-10 14:41:23,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.867e+01 3.175e+01 3.741e+01 6.464e+01, threshold=6.350e+01, percent-clipped=1.0 2024-08-10 14:41:41,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=12.0 2024-08-10 14:41:46,585 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 14:41:51,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2350, loss[loss=0.1128, beats_loss=0.01293, ecapa_loss=0.0002183, whisper_loss=0.09766, over 20112.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01187, ecapa_loss=0.000245, whisper_loss=0.09575, over 3818736.00 frames. ], batch size: 79, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:42:18,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=603420.0, ans=0.0 2024-08-10 14:42:26,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-08-10 14:42:40,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=603520.0, ans=0.125 2024-08-10 14:42:55,238 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2400, loss[loss=0.1301, beats_loss=0.01041, ecapa_loss=0.0002184, whisper_loss=0.1175, over 24259.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01188, ecapa_loss=0.0002439, whisper_loss=0.09593, over 3847023.11 frames. ], batch size: 94, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:43:02,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=603720.0, ans=0.0 2024-08-10 14:43:06,998 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 14:43:15,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=603820.0, ans=0.0 2024-08-10 14:43:16,132 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 14:43:28,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=603920.0, ans=0.125 2024-08-10 14:43:31,813 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+01 2.726e+01 3.127e+01 3.676e+01 5.177e+01, threshold=6.255e+01, percent-clipped=0.0 2024-08-10 14:43:37,220 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-10 14:43:38,319 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 14:43:43,903 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 14:43:46,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=604120.0, ans=0.125 2024-08-10 14:44:00,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2450, loss[loss=0.1329, beats_loss=0.006525, ecapa_loss=0.0003036, whisper_loss=0.1234, over 16819.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01181, ecapa_loss=0.0002431, whisper_loss=0.09594, over 3836596.72 frames. ], batch size: 65, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:44:15,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=604320.0, ans=0.95 2024-08-10 14:44:30,614 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 14:44:32,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=604420.0, ans=0.0 2024-08-10 14:44:33,100 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 14:44:36,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=604420.0, ans=0.125 2024-08-10 14:44:40,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=604520.0, ans=0.0 2024-08-10 14:44:40,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=604520.0, ans=0.125 2024-08-10 14:44:46,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=8.0 2024-08-10 14:44:46,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=604520.0, ans=0.125 2024-08-10 14:44:48,930 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 14:44:54,080 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 14:45:05,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2500, loss[loss=0.112, beats_loss=0.011, ecapa_loss=0.0002112, whisper_loss=0.0989, over 17095.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0118, ecapa_loss=0.0002457, whisper_loss=0.09566, over 3846750.48 frames. ], batch size: 62, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:45:07,437 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 14:45:16,350 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 14:45:17,663 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 14:45:42,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.835e+01 3.123e+01 3.643e+01 5.985e+01, threshold=6.245e+01, percent-clipped=0.0 2024-08-10 14:45:58,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=605120.0, ans=0.125 2024-08-10 14:46:05,810 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-10 14:46:06,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=605120.0, ans=0.125 2024-08-10 14:46:11,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2550, loss[loss=0.1073, beats_loss=0.01344, ecapa_loss=0.0002002, whisper_loss=0.09187, over 18183.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.0118, ecapa_loss=0.0002449, whisper_loss=0.09545, over 3863066.38 frames. ], batch size: 72, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:46:26,483 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 14:46:27,755 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 14:46:32,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=605320.0, ans=0.125 2024-08-10 14:46:53,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=605520.0, ans=0.2 2024-08-10 14:47:00,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-08-10 14:47:09,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=605620.0, ans=0.125 2024-08-10 14:47:14,472 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 14:47:15,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2600, loss[loss=0.1152, beats_loss=0.01227, ecapa_loss=0.0002191, whisper_loss=0.1008, over 22904.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01192, ecapa_loss=0.0002432, whisper_loss=0.09491, over 3860472.09 frames. ], batch size: 88, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:47:34,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=605820.0, ans=0.125 2024-08-10 14:47:49,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.36 vs. limit=15.0 2024-08-10 14:47:51,774 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+01 2.738e+01 3.065e+01 3.602e+01 6.052e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 14:47:55,038 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-10 14:48:00,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=606020.0, ans=0.125 2024-08-10 14:48:06,778 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 14:48:12,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=606120.0, ans=10.0 2024-08-10 14:48:20,611 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2650, loss[loss=0.1244, beats_loss=0.009323, ecapa_loss=0.0003159, whisper_loss=0.1119, over 22072.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01193, ecapa_loss=0.0002437, whisper_loss=0.09484, over 3873465.39 frames. ], batch size: 92, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:48:22,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2024-08-10 14:48:23,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=606220.0, ans=0.125 2024-08-10 14:48:26,066 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 14:48:30,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=606220.0, ans=0.0 2024-08-10 14:48:35,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-08-10 14:48:44,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=606320.0, ans=0.2 2024-08-10 14:48:47,954 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-10 14:48:48,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=606420.0, ans=15.0 2024-08-10 14:49:02,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=606520.0, ans=0.0 2024-08-10 14:49:02,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=606520.0, ans=0.0 2024-08-10 14:49:09,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2024-08-10 14:49:24,637 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 14:49:25,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2700, loss[loss=0.1103, beats_loss=0.009898, ecapa_loss=0.0002715, whisper_loss=0.09767, over 23018.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01191, ecapa_loss=0.0002444, whisper_loss=0.09435, over 3870617.04 frames. ], batch size: 91, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:49:40,198 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 14:49:53,281 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 14:50:02,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 3.024e+01 3.379e+01 4.188e+01 8.555e+01, threshold=6.757e+01, percent-clipped=2.0 2024-08-10 14:50:04,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=12.0 2024-08-10 14:50:05,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=607020.0, ans=0.125 2024-08-10 14:50:06,783 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.637e-01 2024-08-10 14:50:23,245 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 14:50:23,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=607120.0, ans=0.125 2024-08-10 14:50:27,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-10 14:50:28,943 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 14:50:30,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2750, loss[loss=0.1152, beats_loss=0.01051, ecapa_loss=0.0003206, whisper_loss=0.1014, over 14419.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01185, ecapa_loss=0.0002437, whisper_loss=0.09464, over 3857507.87 frames. ], batch size: 58, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:50:50,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-10 14:50:58,942 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 14:51:11,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=607520.0, ans=0.2 2024-08-10 14:51:33,378 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-10 14:51:36,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2800, loss[loss=0.1244, beats_loss=0.0131, ecapa_loss=0.0001903, whisper_loss=0.1094, over 24296.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01181, ecapa_loss=0.0002425, whisper_loss=0.09626, over 3879993.40 frames. ], batch size: 92, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:51:38,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=607720.0, ans=0.125 2024-08-10 14:51:42,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=607720.0, ans=0.125 2024-08-10 14:51:47,733 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 14:51:50,484 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 14:52:03,522 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 14:52:07,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=607920.0, ans=0.0 2024-08-10 14:52:14,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.158e+01 2.765e+01 3.202e+01 3.631e+01 5.642e+01, threshold=6.403e+01, percent-clipped=0.0 2024-08-10 14:52:32,817 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.814e-01 2024-08-10 14:52:32,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=608120.0, ans=0.0 2024-08-10 14:52:34,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=608120.0, ans=0.04949747468305833 2024-08-10 14:52:35,264 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 14:52:36,322 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 14:52:42,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2850, loss[loss=0.1003, beats_loss=0.01246, ecapa_loss=0.0002347, whisper_loss=0.08553, over 22255.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01188, ecapa_loss=0.0002431, whisper_loss=0.09572, over 3861277.52 frames. ], batch size: 89, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:52:45,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=608220.0, ans=0.125 2024-08-10 14:52:46,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=608220.0, ans=0.125 2024-08-10 14:53:04,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=608320.0, ans=0.125 2024-08-10 14:53:07,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=1.98 vs. limit=15.0 2024-08-10 14:53:11,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=608420.0, ans=0.2 2024-08-10 14:53:14,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=608420.0, ans=0.0 2024-08-10 14:53:23,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=608520.0, ans=0.02 2024-08-10 14:53:40,088 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 14:53:41,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=608620.0, ans=0.125 2024-08-10 14:53:47,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2900, loss[loss=0.1203, beats_loss=0.009554, ecapa_loss=0.0002507, whisper_loss=0.1083, over 17889.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01197, ecapa_loss=0.0002435, whisper_loss=0.09504, over 3877184.78 frames. ], batch size: 71, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:53:53,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=608720.0, ans=0.125 2024-08-10 14:54:24,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.824e+01 3.286e+01 3.731e+01 5.146e+01, threshold=6.573e+01, percent-clipped=0.0 2024-08-10 14:54:46,150 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 14:54:53,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 2950, loss[loss=0.09977, beats_loss=0.01319, ecapa_loss=0.0002595, whisper_loss=0.08398, over 20888.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01193, ecapa_loss=0.0002466, whisper_loss=0.0956, over 3917997.71 frames. ], batch size: 85, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:55:01,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=12.0 2024-08-10 14:55:03,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=609220.0, ans=0.1 2024-08-10 14:55:32,879 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 14:55:40,619 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-10 14:55:56,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=609620.0, ans=0.0 2024-08-10 14:55:56,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=609620.0, ans=0.125 2024-08-10 14:55:58,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3000, loss[loss=0.1172, beats_loss=0.01133, ecapa_loss=0.0002698, whisper_loss=0.1031, over 22394.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.0118, ecapa_loss=0.0002494, whisper_loss=0.09634, over 3929352.88 frames. ], batch size: 93, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:55:58,294 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 14:56:35,277 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on ASR_libri: loss=0.2643, beats_loss=0, ecapa_loss=0.0007548, whisper_loss=0.2568, over 922467.00 frames. 2024-08-10 14:56:52,869 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on SV_voxceleb1: loss=0.006405, beats_loss=0, ecapa_loss=0.0006405, whisper_loss=0, over 939242.00 frames. 2024-08-10 14:58:43,645 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on AT_audioset: loss=0.02683, beats_loss=0.02683, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 14:58:43,649 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 14:58:47,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=609720.0, ans=0.125 2024-08-10 14:58:56,451 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 14:58:57,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-08-10 14:58:59,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=609820.0, ans=0.125 2024-08-10 14:59:04,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=609820.0, ans=0.125 2024-08-10 14:59:18,938 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-10 14:59:19,871 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 3.032e+01 3.491e+01 3.911e+01 5.761e+01, threshold=6.982e+01, percent-clipped=0.0 2024-08-10 14:59:29,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=610020.0, ans=0.0 2024-08-10 14:59:35,879 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 14:59:48,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3050, loss[loss=0.1086, beats_loss=0.01343, ecapa_loss=0.0001995, whisper_loss=0.09314, over 21919.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01177, ecapa_loss=0.0002483, whisper_loss=0.09705, over 3930658.39 frames. ], batch size: 88, lr: 1.28e-02, grad_scale: 34359738368.0 2024-08-10 14:59:53,914 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-10 15:00:00,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=610320.0, ans=0.125 2024-08-10 15:00:22,721 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 15:00:26,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=610520.0, ans=0.125 2024-08-10 15:00:29,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.09 vs. limit=15.0 2024-08-10 15:00:35,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-10 15:00:38,614 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 11 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 15:00:39,868 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 15:00:54,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3100, loss[loss=0.1222, beats_loss=0.01239, ecapa_loss=0.0002154, whisper_loss=0.1076, over 19404.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01185, ecapa_loss=0.0002476, whisper_loss=0.09643, over 3936826.66 frames. ], batch size: 73, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:00:56,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=610720.0, ans=0.0 2024-08-10 15:01:00,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=610720.0, ans=0.125 2024-08-10 15:01:04,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=610720.0, ans=0.125 2024-08-10 15:01:30,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.24 vs. limit=22.5 2024-08-10 15:01:32,067 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 24 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-10 15:01:33,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.743e+01 3.024e+01 3.559e+01 5.609e+01, threshold=6.048e+01, percent-clipped=0.0 2024-08-10 15:01:42,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=611020.0, ans=0.0 2024-08-10 15:01:44,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611020.0, ans=0.1 2024-08-10 15:01:45,427 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-10 15:01:48,003 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 15:01:49,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611120.0, ans=0.1 2024-08-10 15:01:58,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-10 15:02:00,701 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 15:02:01,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=611120.0, ans=0.125 2024-08-10 15:02:03,409 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3150, loss[loss=0.1125, beats_loss=0.01077, ecapa_loss=0.0002763, whisper_loss=0.09897, over 22756.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01187, ecapa_loss=0.0002462, whisper_loss=0.09605, over 3922449.52 frames. ], batch size: 93, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:02:13,567 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 15:02:18,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=611320.0, ans=0.0 2024-08-10 15:02:20,895 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 15:02:26,839 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 15:02:27,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=611320.0, ans=0.0 2024-08-10 15:02:39,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=611420.0, ans=0.125 2024-08-10 15:03:03,103 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 15:03:09,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=611620.0, ans=0.125 2024-08-10 15:03:10,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611620.0, ans=0.1 2024-08-10 15:03:15,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3200, loss[loss=0.1143, beats_loss=0.01321, ecapa_loss=0.0002314, whisper_loss=0.09882, over 19961.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01192, ecapa_loss=0.0002451, whisper_loss=0.09611, over 3911863.18 frames. ], batch size: 80, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:03:23,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-08-10 15:03:25,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=611720.0, ans=0.125 2024-08-10 15:03:26,932 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-10 15:03:32,399 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 15:03:43,886 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 15:03:48,319 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 15:03:53,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=611920.0, ans=0.0 2024-08-10 15:03:56,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.779e+01 3.150e+01 3.545e+01 6.901e+01, threshold=6.301e+01, percent-clipped=2.0 2024-08-10 15:03:57,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2024-08-10 15:04:15,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=612120.0, ans=0.125 2024-08-10 15:04:17,164 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 15:04:21,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=612120.0, ans=0.125 2024-08-10 15:04:28,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3250, loss[loss=0.1039, beats_loss=0.01165, ecapa_loss=0.0002109, whisper_loss=0.09019, over 16676.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01192, ecapa_loss=0.0002463, whisper_loss=0.09642, over 3909999.65 frames. ], batch size: 62, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:04:42,594 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 12 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 15:04:53,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2024-08-10 15:05:40,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3300, loss[loss=0.105, beats_loss=0.01569, ecapa_loss=0.0002242, whisper_loss=0.08709, over 20086.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01197, ecapa_loss=0.0002451, whisper_loss=0.09609, over 3903942.72 frames. ], batch size: 83, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:05:49,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=612720.0, ans=0.125 2024-08-10 15:06:13,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=612920.0, ans=0.125 2024-08-10 15:06:15,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=612920.0, ans=0.0 2024-08-10 15:06:22,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.755e+01 3.072e+01 3.647e+01 1.345e+02, threshold=6.143e+01, percent-clipped=1.0 2024-08-10 15:06:38,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=613120.0, ans=0.125 2024-08-10 15:06:54,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3350, loss[loss=0.12, beats_loss=0.01292, ecapa_loss=0.0002673, whisper_loss=0.1044, over 15066.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01195, ecapa_loss=0.000244, whisper_loss=0.09546, over 3886029.38 frames. ], batch size: 62, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:06:54,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=613220.0, ans=0.0 2024-08-10 15:07:40,399 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-10 15:07:43,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-10 15:07:58,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=613620.0, ans=0.125 2024-08-10 15:08:08,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3400, loss[loss=0.09113, beats_loss=0.01082, ecapa_loss=0.000275, whisper_loss=0.07756, over 14605.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01197, ecapa_loss=0.0002415, whisper_loss=0.09477, over 3868800.96 frames. ], batch size: 62, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:08:13,938 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 15:08:22,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2024-08-10 15:08:36,978 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 36 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-10 15:08:37,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=613920.0, ans=0.0 2024-08-10 15:08:37,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.10 vs. limit=15.0 2024-08-10 15:08:45,244 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 15:08:45,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=613920.0, ans=0.125 2024-08-10 15:08:49,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+01 2.884e+01 3.210e+01 3.796e+01 7.234e+01, threshold=6.419e+01, percent-clipped=1.0 2024-08-10 15:09:10,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-08-10 15:09:17,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614120.0, ans=0.1 2024-08-10 15:09:20,152 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 15:09:20,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=614220.0, ans=0.0 2024-08-10 15:09:21,330 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3450, loss[loss=0.1203, beats_loss=0.01082, ecapa_loss=0.0002857, whisper_loss=0.1066, over 16972.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01199, ecapa_loss=0.0002434, whisper_loss=0.09425, over 3858295.71 frames. ], batch size: 70, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:09:25,504 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 15:09:30,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.0 2024-08-10 15:09:46,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=614320.0, ans=0.125 2024-08-10 15:09:47,106 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 15:09:58,619 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 15:10:00,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=614420.0, ans=0.035 2024-08-10 15:10:03,418 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 15:10:03,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=614520.0, ans=0.2 2024-08-10 15:10:05,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-10 15:10:10,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=614520.0, ans=0.0 2024-08-10 15:10:14,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2024-08-10 15:10:15,394 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 15:10:15,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2024-08-10 15:10:16,979 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-10 15:10:31,837 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 15:10:34,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3500, loss[loss=0.1142, beats_loss=0.01181, ecapa_loss=0.0002333, whisper_loss=0.1001, over 23000.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01194, ecapa_loss=0.0002447, whisper_loss=0.09512, over 3899424.57 frames. ], batch size: 91, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:10:45,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=614720.0, ans=0.0 2024-08-10 15:10:48,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=614820.0, ans=0.125 2024-08-10 15:10:55,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-08-10 15:10:56,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2024-08-10 15:10:59,426 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 15:11:06,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=614920.0, ans=0.2 2024-08-10 15:11:15,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+01 2.754e+01 3.128e+01 3.525e+01 7.630e+01, threshold=6.256e+01, percent-clipped=1.0 2024-08-10 15:11:18,218 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 15:11:23,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=615020.0, ans=0.125 2024-08-10 15:11:41,281 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 15:11:46,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3550, loss[loss=0.1212, beats_loss=0.008682, ecapa_loss=0.0002481, whisper_loss=0.1101, over 15052.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01186, ecapa_loss=0.000244, whisper_loss=0.09568, over 3919877.86 frames. ], batch size: 57, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:11:51,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.15 vs. limit=15.0 2024-08-10 15:11:52,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=615220.0, ans=0.2 2024-08-10 15:11:55,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=615220.0, ans=0.125 2024-08-10 15:11:58,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=615220.0, ans=0.125 2024-08-10 15:12:04,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=615320.0, ans=0.04949747468305833 2024-08-10 15:12:27,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2024-08-10 15:12:30,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=615520.0, ans=0.125 2024-08-10 15:12:30,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-08-10 15:12:32,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=615520.0, ans=0.0 2024-08-10 15:12:35,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=615520.0, ans=0.125 2024-08-10 15:12:41,537 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 41 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 15:12:58,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3600, loss[loss=0.1137, beats_loss=0.01044, ecapa_loss=0.0002322, whisper_loss=0.1009, over 22708.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01191, ecapa_loss=0.0002436, whisper_loss=0.09521, over 3912021.61 frames. ], batch size: 88, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:13:17,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-08-10 15:13:21,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.72 vs. limit=22.5 2024-08-10 15:13:29,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=12.0 2024-08-10 15:13:32,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615920.0, ans=0.125 2024-08-10 15:13:37,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615920.0, ans=0.125 2024-08-10 15:13:39,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.884e+01 3.216e+01 3.547e+01 5.586e+01, threshold=6.432e+01, percent-clipped=0.0 2024-08-10 15:13:53,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=616020.0, ans=0.125 2024-08-10 15:13:56,166 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 15:14:03,058 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 15:14:11,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3650, loss[loss=0.1293, beats_loss=0.01027, ecapa_loss=0.000192, whisper_loss=0.1171, over 15729.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01186, ecapa_loss=0.000244, whisper_loss=0.09518, over 3876956.94 frames. ], batch size: 58, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:14:14,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=616220.0, ans=0.125 2024-08-10 15:14:48,635 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:15:12,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=616620.0, ans=0.125 2024-08-10 15:15:14,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=616620.0, ans=0.0 2024-08-10 15:15:17,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2024-08-10 15:15:23,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3700, loss[loss=0.1351, beats_loss=0.00858, ecapa_loss=0.0002502, whisper_loss=0.1241, over 17436.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.0118, ecapa_loss=0.0002442, whisper_loss=0.09552, over 3865985.88 frames. ], batch size: 65, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:15:39,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=616820.0, ans=0.5 2024-08-10 15:15:49,660 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 15:15:49,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=616820.0, ans=0.125 2024-08-10 15:15:52,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=616920.0, ans=0.0 2024-08-10 15:15:54,105 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 15:15:56,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=12.0 2024-08-10 15:16:05,217 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.736e+01 3.079e+01 3.558e+01 5.544e+01, threshold=6.157e+01, percent-clipped=0.0 2024-08-10 15:16:14,314 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 17 from LS+wenet, 26 from Vox, 48 fro AS 2024-08-10 15:16:36,121 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 11 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 15:16:37,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3750, loss[loss=0.07835, beats_loss=0.01473, ecapa_loss=0.0002203, whisper_loss=0.06142, over 13956.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01177, ecapa_loss=0.000243, whisper_loss=0.0964, over 3888146.04 frames. ], batch size: 58, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:16:46,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=617220.0, ans=0.05 2024-08-10 15:16:57,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=617320.0, ans=0.125 2024-08-10 15:16:57,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=617320.0, ans=0.125 2024-08-10 15:16:59,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2024-08-10 15:17:10,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=617420.0, ans=0.0 2024-08-10 15:17:25,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=617520.0, ans=0.125 2024-08-10 15:17:32,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=617520.0, ans=0.125 2024-08-10 15:17:43,384 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 15:17:49,457 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3800, loss[loss=0.09252, beats_loss=0.01458, ecapa_loss=0.0001549, whisper_loss=0.0764, over 15371.00 frames. ], tot_loss[loss=0.1107, beats_loss=0.01182, ecapa_loss=0.0002442, whisper_loss=0.09648, over 3914533.80 frames. ], batch size: 55, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:17:58,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=617720.0, ans=0.0 2024-08-10 15:17:59,386 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 15:18:01,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=617720.0, ans=0.0 2024-08-10 15:18:07,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-08-10 15:18:12,637 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-10 15:18:17,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-10 15:18:20,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=617920.0, ans=0.2 2024-08-10 15:18:26,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=617920.0, ans=0.125 2024-08-10 15:18:30,982 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.844e+01 3.115e+01 3.732e+01 5.922e+01, threshold=6.230e+01, percent-clipped=0.0 2024-08-10 15:18:35,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=618020.0, ans=10.0 2024-08-10 15:18:40,143 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-10 15:18:47,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=618120.0, ans=0.0 2024-08-10 15:18:48,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=618120.0, ans=0.125 2024-08-10 15:18:52,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2024-08-10 15:19:02,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3850, loss[loss=0.08651, beats_loss=0.01464, ecapa_loss=0.0002427, whisper_loss=0.06943, over 14584.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01191, ecapa_loss=0.0002438, whisper_loss=0.09595, over 3922474.92 frames. ], batch size: 60, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:19:16,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=618220.0, ans=0.1 2024-08-10 15:19:22,559 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 15:19:33,194 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 15:19:47,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=618420.0, ans=0.2 2024-08-10 15:19:57,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.92 vs. limit=22.5 2024-08-10 15:19:59,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=618520.0, ans=0.125 2024-08-10 15:20:09,991 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:20:19,325 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 15:20:23,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618620.0, ans=0.1 2024-08-10 15:20:31,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3900, loss[loss=0.1043, beats_loss=0.01349, ecapa_loss=0.0002563, whisper_loss=0.08824, over 19829.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01193, ecapa_loss=0.0002453, whisper_loss=0.09639, over 3930658.05 frames. ], batch size: 83, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:20:33,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-10 15:20:44,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2024-08-10 15:20:45,401 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 15:20:45,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=618720.0, ans=0.125 2024-08-10 15:21:09,407 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-10 15:21:10,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.02 vs. limit=15.0 2024-08-10 15:21:11,830 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:21:15,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=618920.0, ans=0.05 2024-08-10 15:21:24,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+01 3.061e+01 3.504e+01 4.098e+01 1.751e+02, threshold=7.008e+01, percent-clipped=3.0 2024-08-10 15:21:31,047 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 15:22:11,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 3950, loss[loss=0.107, beats_loss=0.01335, ecapa_loss=0.0002563, whisper_loss=0.09104, over 22842.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01188, ecapa_loss=0.0002464, whisper_loss=0.09662, over 3917161.48 frames. ], batch size: 95, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:22:32,406 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 15:22:32,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2024-08-10 15:22:36,527 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-10 15:22:55,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-10 15:22:56,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=619420.0, ans=0.125 2024-08-10 15:22:58,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-08-10 15:22:59,491 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 15:23:04,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619420.0, ans=0.1 2024-08-10 15:23:07,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=619420.0, ans=0.125 2024-08-10 15:23:10,704 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:23:22,625 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 15:23:43,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-10 15:23:54,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4000, loss[loss=0.1239, beats_loss=0.009601, ecapa_loss=0.0002208, whisper_loss=0.1121, over 24085.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01185, ecapa_loss=0.0002451, whisper_loss=0.0967, over 3925333.65 frames. ], batch size: 92, lr: 1.27e-02, grad_scale: 34359738368.0 2024-08-10 15:23:59,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=619720.0, ans=0.125 2024-08-10 15:24:16,443 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 15:24:27,976 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-10 15:25:02,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+01 2.861e+01 3.318e+01 3.884e+01 5.554e+01, threshold=6.636e+01, percent-clipped=0.0 2024-08-10 15:25:24,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=620020.0, ans=0.0 2024-08-10 15:25:52,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4050, loss[loss=0.1226, beats_loss=0.01147, ecapa_loss=0.0002689, whisper_loss=0.1085, over 20042.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01176, ecapa_loss=0.0002464, whisper_loss=0.09667, over 3883236.09 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 68719476736.0 2024-08-10 15:26:38,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=620420.0, ans=0.015 2024-08-10 15:26:52,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620420.0, ans=0.125 2024-08-10 15:27:00,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=620420.0, ans=0.125 2024-08-10 15:27:14,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=620520.0, ans=0.0 2024-08-10 15:27:29,938 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 15:27:36,533 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 15:27:39,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=620620.0, ans=0.125 2024-08-10 15:27:46,497 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 15:27:50,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4100, loss[loss=0.09428, beats_loss=0.01229, ecapa_loss=0.00025, whisper_loss=0.07948, over 20502.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01174, ecapa_loss=0.0002475, whisper_loss=0.09689, over 3891081.92 frames. ], batch size: 84, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:28:19,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=620820.0, ans=0.125 2024-08-10 15:28:27,505 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 15:28:59,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+01 2.982e+01 3.358e+01 3.918e+01 5.492e+01, threshold=6.716e+01, percent-clipped=0.0 2024-08-10 15:29:05,668 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 15:29:11,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-08-10 15:29:13,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=621020.0, ans=0.125 2024-08-10 15:29:26,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-10 15:29:34,330 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4150, loss[loss=0.1089, beats_loss=0.01058, ecapa_loss=0.0002894, whisper_loss=0.09541, over 21543.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01184, ecapa_loss=0.0002478, whisper_loss=0.09628, over 3902406.67 frames. ], batch size: 91, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:29:40,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=621220.0, ans=0.1 2024-08-10 15:29:48,608 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-10 15:30:11,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=621420.0, ans=0.0 2024-08-10 15:30:14,208 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 15:30:26,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.37 vs. limit=15.0 2024-08-10 15:30:31,598 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.387e+00 2024-08-10 15:30:49,021 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4200, loss[loss=0.09599, beats_loss=0.01485, ecapa_loss=0.0002238, whisper_loss=0.07891, over 19695.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01189, ecapa_loss=0.0002467, whisper_loss=0.09546, over 3890472.74 frames. ], batch size: 83, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:30:52,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=621720.0, ans=0.125 2024-08-10 15:30:57,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-10 15:31:04,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=621820.0, ans=0.0 2024-08-10 15:31:07,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=621820.0, ans=0.2 2024-08-10 15:31:21,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=621920.0, ans=0.0 2024-08-10 15:31:22,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.10 vs. limit=22.5 2024-08-10 15:31:31,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.809e+01 3.141e+01 3.651e+01 6.704e+01, threshold=6.282e+01, percent-clipped=0.0 2024-08-10 15:31:34,560 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 15:31:46,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=622020.0, ans=0.2 2024-08-10 15:31:51,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622120.0, ans=0.1 2024-08-10 15:31:55,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=622120.0, ans=0.125 2024-08-10 15:32:05,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4250, loss[loss=0.09783, beats_loss=0.01376, ecapa_loss=0.0002292, whisper_loss=0.08178, over 19462.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01182, ecapa_loss=0.0002447, whisper_loss=0.09598, over 3935745.76 frames. ], batch size: 81, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:32:05,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=622220.0, ans=0.125 2024-08-10 15:32:08,878 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 15:32:11,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=622220.0, ans=0.125 2024-08-10 15:32:13,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=622220.0, ans=0.125 2024-08-10 15:32:48,678 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 15:32:51,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=622520.0, ans=0.2 2024-08-10 15:32:52,693 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 15:33:02,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=622520.0, ans=0.0 2024-08-10 15:33:17,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=622620.0, ans=0.0 2024-08-10 15:33:19,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4300, loss[loss=0.1065, beats_loss=0.01132, ecapa_loss=0.000263, whisper_loss=0.09253, over 20235.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01176, ecapa_loss=0.0002447, whisper_loss=0.0962, over 3916234.00 frames. ], batch size: 82, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:33:39,787 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-10 15:33:59,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.798e+01 3.084e+01 3.774e+01 7.124e+01, threshold=6.168e+01, percent-clipped=2.0 2024-08-10 15:34:03,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-10 15:34:07,306 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 15:34:07,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=623020.0, ans=0.125 2024-08-10 15:34:28,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=623120.0, ans=0.0 2024-08-10 15:34:30,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4350, loss[loss=0.09751, beats_loss=0.01343, ecapa_loss=0.0002582, whisper_loss=0.0815, over 20542.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01176, ecapa_loss=0.0002431, whisper_loss=0.09581, over 3925993.47 frames. ], batch size: 88, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:34:48,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=623320.0, ans=0.125 2024-08-10 15:35:06,665 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-10 15:35:50,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4400, loss[loss=0.09162, beats_loss=0.01332, ecapa_loss=0.0002439, whisper_loss=0.07586, over 21917.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01178, ecapa_loss=0.000242, whisper_loss=0.09551, over 3921950.86 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:35:52,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=623720.0, ans=0.2 2024-08-10 15:36:10,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=623820.0, ans=0.05 2024-08-10 15:36:19,773 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 15:36:29,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=623920.0, ans=0.125 2024-08-10 15:36:30,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.05 vs. limit=22.5 2024-08-10 15:36:38,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.220e+01 2.993e+01 3.424e+01 4.007e+01 6.509e+01, threshold=6.848e+01, percent-clipped=2.0 2024-08-10 15:36:52,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-10 15:36:55,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.32 vs. limit=22.5 2024-08-10 15:36:56,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=624020.0, ans=0.125 2024-08-10 15:36:56,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2024-08-10 15:37:15,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4450, loss[loss=0.1041, beats_loss=0.0105, ecapa_loss=0.000304, whisper_loss=0.09051, over 21545.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01172, ecapa_loss=0.0002438, whisper_loss=0.09548, over 3904085.27 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:37:23,242 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-10 15:37:54,455 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 15:37:55,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.30 vs. limit=22.5 2024-08-10 15:38:04,088 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 15:38:12,666 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:38:39,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4500, loss[loss=0.0777, beats_loss=0.01193, ecapa_loss=0.000205, whisper_loss=0.06372, over 15086.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01174, ecapa_loss=0.0002412, whisper_loss=0.09554, over 3940263.60 frames. ], batch size: 57, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:38:46,181 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 15:38:51,258 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 15:38:59,883 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 15:39:22,155 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 15:39:27,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+01 2.908e+01 3.221e+01 3.849e+01 6.109e+01, threshold=6.442e+01, percent-clipped=0.0 2024-08-10 15:40:01,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=625120.0, ans=0.0 2024-08-10 15:40:05,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4550, loss[loss=0.1124, beats_loss=0.01255, ecapa_loss=0.0002399, whisper_loss=0.09745, over 22458.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01168, ecapa_loss=0.0002443, whisper_loss=0.09572, over 3936659.41 frames. ], batch size: 89, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:40:27,527 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 15:40:37,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=625420.0, ans=0.125 2024-08-10 15:40:38,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625420.0, ans=0.1 2024-08-10 15:41:23,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4600, loss[loss=0.1097, beats_loss=0.009708, ecapa_loss=0.0002375, whisper_loss=0.09761, over 15190.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01173, ecapa_loss=0.0002443, whisper_loss=0.0963, over 3942009.84 frames. ], batch size: 58, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:41:23,544 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-10 15:41:36,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=625720.0, ans=0.125 2024-08-10 15:41:50,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.99 vs. limit=22.5 2024-08-10 15:41:59,190 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 15:42:06,334 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:42:07,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.652e+01 3.147e+01 3.453e+01 6.048e+01, threshold=6.293e+01, percent-clipped=0.0 2024-08-10 15:42:18,564 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 15:42:24,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=626020.0, ans=0.125 2024-08-10 15:42:42,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4650, loss[loss=0.1059, beats_loss=0.01193, ecapa_loss=0.0002904, whisper_loss=0.09106, over 21815.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01177, ecapa_loss=0.0002446, whisper_loss=0.09627, over 3943997.48 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:42:45,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=626220.0, ans=0.2 2024-08-10 15:42:50,479 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 15:42:57,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=626220.0, ans=0.2 2024-08-10 15:42:57,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-08-10 15:43:01,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=12.0 2024-08-10 15:43:16,869 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 15:43:18,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=626420.0, ans=0.2 2024-08-10 15:43:19,785 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 15:43:29,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=626420.0, ans=0.0 2024-08-10 15:43:36,194 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:43:37,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=626520.0, ans=0.04949747468305833 2024-08-10 15:43:40,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626520.0, ans=0.1 2024-08-10 15:43:41,934 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 15:43:59,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=12.0 2024-08-10 15:44:01,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.43 vs. limit=15.0 2024-08-10 15:44:03,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4700, loss[loss=0.1101, beats_loss=0.009947, ecapa_loss=0.0002296, whisper_loss=0.09784, over 14633.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01183, ecapa_loss=0.0002421, whisper_loss=0.09606, over 3946765.98 frames. ], batch size: 55, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:44:19,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-10 15:44:28,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=626820.0, ans=12.0 2024-08-10 15:44:31,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=626820.0, ans=0.125 2024-08-10 15:44:31,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=626820.0, ans=0.125 2024-08-10 15:44:32,341 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 15:44:38,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=626920.0, ans=0.125 2024-08-10 15:44:48,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.667e+01 3.139e+01 3.783e+01 7.574e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-10 15:44:56,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=627020.0, ans=0.125 2024-08-10 15:45:11,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=627120.0, ans=10.0 2024-08-10 15:45:24,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4750, loss[loss=0.09691, beats_loss=0.01259, ecapa_loss=0.0002584, whisper_loss=0.08174, over 19912.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01182, ecapa_loss=0.0002436, whisper_loss=0.09603, over 3952405.78 frames. ], batch size: 83, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:45:37,753 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 15:45:37,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627220.0, ans=0.125 2024-08-10 15:45:41,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=627320.0, ans=0.2 2024-08-10 15:45:51,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=627320.0, ans=0.125 2024-08-10 15:45:51,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627320.0, ans=0.1 2024-08-10 15:46:11,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-08-10 15:46:18,197 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.556e-02 2024-08-10 15:46:21,323 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 15:46:21,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=627520.0, ans=0.125 2024-08-10 15:46:36,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.51 vs. limit=10.0 2024-08-10 15:46:43,167 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-10 15:46:47,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4800, loss[loss=0.1315, beats_loss=0.009263, ecapa_loss=0.0002668, whisper_loss=0.1196, over 20903.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01194, ecapa_loss=0.0002447, whisper_loss=0.09595, over 3960658.75 frames. ], batch size: 82, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:46:58,433 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 15:47:07,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2024-08-10 15:47:15,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=627820.0, ans=0.2 2024-08-10 15:47:35,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.946e+01 3.351e+01 4.117e+01 7.010e+01, threshold=6.703e+01, percent-clipped=2.0 2024-08-10 15:47:47,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=628020.0, ans=22.5 2024-08-10 15:47:53,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=628120.0, ans=0.2 2024-08-10 15:48:01,238 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 15:48:03,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=628120.0, ans=0.125 2024-08-10 15:48:12,490 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4850, loss[loss=0.07633, beats_loss=0.0166, ecapa_loss=0.0002362, whisper_loss=0.05737, over 19064.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01195, ecapa_loss=0.0002438, whisper_loss=0.09638, over 3947875.62 frames. ], batch size: 77, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:48:22,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-10 15:48:27,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=628320.0, ans=0.0 2024-08-10 15:48:27,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=628320.0, ans=0.0 2024-08-10 15:48:36,867 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 15:48:44,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628320.0, ans=0.1 2024-08-10 15:48:45,862 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 15:48:56,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=628420.0, ans=0.0 2024-08-10 15:49:10,798 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 15:49:25,152 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 15:49:28,376 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 15:49:35,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=628720.0, ans=0.0 2024-08-10 15:49:35,865 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4900, loss[loss=0.1264, beats_loss=0.01222, ecapa_loss=0.0001925, whisper_loss=0.1123, over 23849.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01185, ecapa_loss=0.0002419, whisper_loss=0.09668, over 3939475.31 frames. ], batch size: 90, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:49:48,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=628720.0, ans=0.125 2024-08-10 15:49:50,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=628820.0, ans=0.125 2024-08-10 15:49:53,982 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 15:49:58,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=628820.0, ans=0.125 2024-08-10 15:50:00,629 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 15:50:06,446 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 15:50:08,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2024-08-10 15:50:09,198 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 15:50:19,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.794e+01 3.081e+01 3.669e+01 6.406e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-10 15:50:31,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=629020.0, ans=0.2 2024-08-10 15:50:44,394 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 15:50:54,930 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 4950, loss[loss=0.1163, beats_loss=0.01112, ecapa_loss=0.0002342, whisper_loss=0.1028, over 18919.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01191, ecapa_loss=0.0002417, whisper_loss=0.09495, over 3878193.97 frames. ], batch size: 73, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:50:55,129 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 15:51:12,676 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 15:51:20,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2024-08-10 15:51:23,493 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 15:51:31,527 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 15:51:31,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=629420.0, ans=0.125 2024-08-10 15:51:55,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-10 15:51:55,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-10 15:52:12,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=629620.0, ans=0.125 2024-08-10 15:52:12,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-10 15:52:15,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5000, loss[loss=0.08381, beats_loss=0.01383, ecapa_loss=0.0002474, whisper_loss=0.06751, over 14150.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01187, ecapa_loss=0.0002428, whisper_loss=0.09563, over 3857320.06 frames. ], batch size: 57, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:52:36,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2024-08-10 15:52:56,546 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 15:53:04,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 2.939e+01 3.385e+01 3.961e+01 1.332e+02, threshold=6.770e+01, percent-clipped=1.0 2024-08-10 15:53:07,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=630020.0, ans=0.2 2024-08-10 15:53:23,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=630120.0, ans=0.125 2024-08-10 15:53:37,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5050, loss[loss=0.09051, beats_loss=0.01447, ecapa_loss=0.0002582, whisper_loss=0.07346, over 21285.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01198, ecapa_loss=0.0002419, whisper_loss=0.09562, over 3903918.26 frames. ], batch size: 91, lr: 1.26e-02, grad_scale: 68719476736.0 2024-08-10 15:53:42,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=630220.0, ans=0.125 2024-08-10 15:53:46,183 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 15:53:46,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=630220.0, ans=0.2 2024-08-10 15:54:01,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=630320.0, ans=0.125 2024-08-10 15:54:11,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-10 15:54:49,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=630620.0, ans=0.0 2024-08-10 15:54:52,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=630620.0, ans=0.125 2024-08-10 15:54:59,612 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5100, loss[loss=0.1178, beats_loss=0.01233, ecapa_loss=0.0001704, whisper_loss=0.1038, over 24579.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01198, ecapa_loss=0.0002392, whisper_loss=0.09597, over 3937821.82 frames. ], batch size: 91, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:54:59,797 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-10 15:55:00,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630720.0, ans=0.1 2024-08-10 15:55:25,762 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-10 15:55:27,217 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 15:55:44,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+01 2.958e+01 3.434e+01 3.932e+01 6.642e+01, threshold=6.868e+01, percent-clipped=0.0 2024-08-10 15:55:48,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=631020.0, ans=0.0 2024-08-10 15:56:20,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5150, loss[loss=0.1122, beats_loss=0.009845, ecapa_loss=0.0002542, whisper_loss=0.09985, over 17878.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01193, ecapa_loss=0.0002388, whisper_loss=0.09661, over 3939656.14 frames. ], batch size: 70, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:56:21,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=631220.0, ans=0.2 2024-08-10 15:56:38,571 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 15:56:50,111 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 15:56:52,823 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-10 15:56:54,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=631420.0, ans=0.0 2024-08-10 15:56:55,897 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 15:57:09,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2024-08-10 15:57:21,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=10.0 2024-08-10 15:57:22,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=631620.0, ans=0.125 2024-08-10 15:57:26,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-08-10 15:57:31,492 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-10 15:57:35,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=631620.0, ans=0.125 2024-08-10 15:57:35,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2024-08-10 15:57:37,777 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5200, loss[loss=0.116, beats_loss=0.009407, ecapa_loss=0.0003764, whisper_loss=0.1028, over 20294.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01196, ecapa_loss=0.0002398, whisper_loss=0.09572, over 3913236.86 frames. ], batch size: 86, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:57:39,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631720.0, ans=0.1 2024-08-10 15:57:48,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-10 15:57:53,022 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 15:58:11,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631920.0, ans=0.1 2024-08-10 15:58:12,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=631920.0, ans=0.2 2024-08-10 15:58:19,284 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.945e+01 3.443e+01 4.072e+01 7.195e+01, threshold=6.886e+01, percent-clipped=1.0 2024-08-10 15:58:23,987 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-10 15:58:24,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=632020.0, ans=0.125 2024-08-10 15:58:25,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=632020.0, ans=0.125 2024-08-10 15:58:42,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=632120.0, ans=0.04949747468305833 2024-08-10 15:58:47,451 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 15:58:51,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5250, loss[loss=0.1354, beats_loss=0.01003, ecapa_loss=0.0002309, whisper_loss=0.1231, over 23487.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01192, ecapa_loss=0.0002396, whisper_loss=0.09542, over 3915860.07 frames. ], batch size: 91, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 15:58:51,751 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 15:58:56,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=632220.0, ans=0.0 2024-08-10 15:58:57,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2024-08-10 15:59:15,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632320.0, ans=0.1 2024-08-10 15:59:35,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=632420.0, ans=0.0 2024-08-10 15:59:42,868 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 15:59:44,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632520.0, ans=0.1 2024-08-10 16:00:07,021 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5300, loss[loss=0.08949, beats_loss=0.01283, ecapa_loss=0.0001891, whisper_loss=0.07477, over 14559.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01186, ecapa_loss=0.0002398, whisper_loss=0.09552, over 3903821.49 frames. ], batch size: 56, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:00:14,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=632720.0, ans=0.5 2024-08-10 16:00:17,708 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-10 16:00:23,782 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 16:00:43,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=632920.0, ans=0.125 2024-08-10 16:00:47,844 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.840e+01 3.204e+01 3.763e+01 6.547e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 16:01:18,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5350, loss[loss=0.08966, beats_loss=0.01209, ecapa_loss=0.0002758, whisper_loss=0.07482, over 18783.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01182, ecapa_loss=0.0002417, whisper_loss=0.09546, over 3857105.20 frames. ], batch size: 80, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:01:22,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=633220.0, ans=0.125 2024-08-10 16:01:39,027 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 16:01:39,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=633320.0, ans=10.0 2024-08-10 16:01:41,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=633320.0, ans=0.025 2024-08-10 16:01:50,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633420.0, ans=0.1 2024-08-10 16:02:01,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=633520.0, ans=0.2 2024-08-10 16:02:06,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=633520.0, ans=0.125 2024-08-10 16:02:26,199 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 16:02:28,572 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5400, loss[loss=0.1084, beats_loss=0.01199, ecapa_loss=0.0002703, whisper_loss=0.09366, over 21778.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01176, ecapa_loss=0.000242, whisper_loss=0.0959, over 3860108.06 frames. ], batch size: 92, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:02:55,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=633920.0, ans=0.125 2024-08-10 16:02:56,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=633920.0, ans=0.95 2024-08-10 16:03:06,404 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 16:03:07,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.970e+01 3.287e+01 3.858e+01 5.350e+01, threshold=6.575e+01, percent-clipped=0.0 2024-08-10 16:03:27,645 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-10 16:03:34,088 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.904e-01 2024-08-10 16:03:37,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5450, loss[loss=0.133, beats_loss=0.00941, ecapa_loss=0.0002178, whisper_loss=0.1215, over 18908.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0118, ecapa_loss=0.0002402, whisper_loss=0.09574, over 3880084.05 frames. ], batch size: 71, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:03:44,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=634220.0, ans=0.125 2024-08-10 16:03:49,729 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-10 16:03:49,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=634320.0, ans=0.125 2024-08-10 16:03:59,139 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-10 16:04:02,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=634320.0, ans=10.0 2024-08-10 16:04:07,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=634420.0, ans=0.125 2024-08-10 16:04:17,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=634520.0, ans=0.125 2024-08-10 16:04:19,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=634520.0, ans=0.2 2024-08-10 16:04:31,482 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 16:04:44,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5500, loss[loss=0.1185, beats_loss=0.01189, ecapa_loss=0.0002336, whisper_loss=0.1043, over 22195.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01174, ecapa_loss=0.0002407, whisper_loss=0.09623, over 3865836.27 frames. ], batch size: 88, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:04:48,421 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 16:04:57,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2024-08-10 16:05:08,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=634820.0, ans=0.0 2024-08-10 16:05:22,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.955e+01 3.201e+01 3.849e+01 6.033e+01, threshold=6.402e+01, percent-clipped=0.0 2024-08-10 16:05:47,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2024-08-10 16:05:52,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5550, loss[loss=0.09468, beats_loss=0.01181, ecapa_loss=0.0002971, whisper_loss=0.07989, over 21485.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01188, ecapa_loss=0.0002392, whisper_loss=0.09546, over 3899957.65 frames. ], batch size: 92, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:05:53,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635220.0, ans=0.1 2024-08-10 16:06:09,149 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 16:06:13,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635320.0, ans=0.125 2024-08-10 16:06:21,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=635420.0, ans=0.125 2024-08-10 16:06:27,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=635420.0, ans=0.05 2024-08-10 16:06:32,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635520.0, ans=0.1 2024-08-10 16:06:36,195 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 16:06:47,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=635620.0, ans=0.2 2024-08-10 16:06:58,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5600, loss[loss=0.1142, beats_loss=0.01336, ecapa_loss=0.0001872, whisper_loss=0.09892, over 18074.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01193, ecapa_loss=0.0002399, whisper_loss=0.09484, over 3888437.15 frames. ], batch size: 67, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:07:05,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=635720.0, ans=0.125 2024-08-10 16:07:10,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=635820.0, ans=0.1 2024-08-10 16:07:19,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=15.0 2024-08-10 16:07:19,927 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-10 16:07:20,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=635820.0, ans=0.09899494936611666 2024-08-10 16:07:25,353 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 16:07:29,283 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-10 16:07:33,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=635920.0, ans=0.125 2024-08-10 16:07:33,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=635920.0, ans=0.0 2024-08-10 16:07:35,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.708e+01 3.041e+01 3.496e+01 5.299e+01, threshold=6.081e+01, percent-clipped=0.0 2024-08-10 16:07:39,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636020.0, ans=0.125 2024-08-10 16:07:55,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=636120.0, ans=0.125 2024-08-10 16:07:59,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=636120.0, ans=0.1 2024-08-10 16:08:04,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5650, loss[loss=0.1116, beats_loss=0.01005, ecapa_loss=0.0002427, whisper_loss=0.09917, over 17667.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01184, ecapa_loss=0.0002402, whisper_loss=0.09522, over 3882403.91 frames. ], batch size: 70, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:08:15,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=636220.0, ans=0.125 2024-08-10 16:08:26,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=636320.0, ans=0.2 2024-08-10 16:08:34,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=636420.0, ans=0.125 2024-08-10 16:08:46,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=636520.0, ans=0.0 2024-08-10 16:08:55,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=636520.0, ans=0.0 2024-08-10 16:08:56,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=636620.0, ans=0.125 2024-08-10 16:09:01,744 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 16:09:06,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=636620.0, ans=0.2 2024-08-10 16:09:10,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5700, loss[loss=0.1303, beats_loss=0.0105, ecapa_loss=0.0001799, whisper_loss=0.118, over 18713.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01179, ecapa_loss=0.0002389, whisper_loss=0.09605, over 3906307.19 frames. ], batch size: 68, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:09:11,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=636720.0, ans=0.125 2024-08-10 16:09:27,657 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-10 16:09:48,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+01 2.928e+01 3.301e+01 4.183e+01 7.157e+01, threshold=6.602e+01, percent-clipped=2.0 2024-08-10 16:09:55,583 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 16:09:58,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=637020.0, ans=0.0 2024-08-10 16:10:02,590 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 16:10:09,064 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 16:10:16,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=637120.0, ans=0.125 2024-08-10 16:10:19,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5750, loss[loss=0.1112, beats_loss=0.01077, ecapa_loss=0.0002706, whisper_loss=0.09777, over 14416.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01173, ecapa_loss=0.0002398, whisper_loss=0.0961, over 3893895.04 frames. ], batch size: 55, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:10:27,970 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 16:10:28,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-08-10 16:10:29,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=637220.0, ans=0.0 2024-08-10 16:11:08,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=637520.0, ans=0.125 2024-08-10 16:11:28,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5800, loss[loss=0.1392, beats_loss=0.009373, ecapa_loss=0.0002815, whisper_loss=0.127, over 22961.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01177, ecapa_loss=0.0002424, whisper_loss=0.09578, over 3905547.04 frames. ], batch size: 89, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:11:30,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=637720.0, ans=0.025 2024-08-10 16:11:51,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=637820.0, ans=0.125 2024-08-10 16:12:01,291 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 16:12:03,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-10 16:12:05,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=637920.0, ans=0.1 2024-08-10 16:12:07,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.715e+01 3.192e+01 3.464e+01 4.938e+01, threshold=6.385e+01, percent-clipped=0.0 2024-08-10 16:12:38,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5850, loss[loss=0.09283, beats_loss=0.01211, ecapa_loss=0.0002658, whisper_loss=0.07806, over 17907.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01181, ecapa_loss=0.0002407, whisper_loss=0.0958, over 3908684.74 frames. ], batch size: 74, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:12:45,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=638220.0, ans=0.125 2024-08-10 16:12:53,494 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 16:12:56,953 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 16:12:58,155 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-10 16:13:10,351 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 16:13:16,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=638420.0, ans=0.125 2024-08-10 16:13:48,171 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5900, loss[loss=0.136, beats_loss=0.01044, ecapa_loss=0.0002122, whisper_loss=0.1234, over 24678.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01187, ecapa_loss=0.0002405, whisper_loss=0.09525, over 3894460.49 frames. ], batch size: 93, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:14:05,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=638820.0, ans=0.125 2024-08-10 16:14:26,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.959e+01 3.304e+01 3.845e+01 6.831e+01, threshold=6.608e+01, percent-clipped=1.0 2024-08-10 16:14:29,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639020.0, ans=0.1 2024-08-10 16:14:37,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639020.0, ans=0.1 2024-08-10 16:14:39,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=639020.0, ans=0.0 2024-08-10 16:14:40,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=639020.0, ans=0.025 2024-08-10 16:14:56,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 5950, loss[loss=0.09869, beats_loss=0.01399, ecapa_loss=0.0002322, whisper_loss=0.08238, over 21803.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01194, ecapa_loss=0.0002396, whisper_loss=0.09451, over 3879099.40 frames. ], batch size: 90, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:15:05,017 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 16:15:09,344 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 16:15:10,399 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 16:15:10,770 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:15:22,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=639320.0, ans=0.0 2024-08-10 16:15:30,376 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:15:41,489 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-10 16:15:46,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=639520.0, ans=0.125 2024-08-10 16:15:48,806 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 16:15:58,702 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 16:16:07,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6000, loss[loss=0.1062, beats_loss=0.01309, ecapa_loss=0.0002043, whisper_loss=0.09109, over 19881.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01194, ecapa_loss=0.0002387, whisper_loss=0.09494, over 3893342.55 frames. ], batch size: 78, lr: 1.25e-02, grad_scale: 68719476736.0 2024-08-10 16:16:07,884 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 16:16:49,341 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on ASR_libri: loss=0.2642, beats_loss=0, ecapa_loss=0.0007414, whisper_loss=0.2567, over 922467.00 frames. 2024-08-10 16:17:08,525 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on SV_voxceleb1: loss=0.006164, beats_loss=0, ecapa_loss=0.0006164, whisper_loss=0, over 939242.00 frames. 2024-08-10 16:19:02,524 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on AT_audioset: loss=0.02682, beats_loss=0.02682, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 16:19:02,529 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 16:19:03,952 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 16:19:16,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=639820.0, ans=0.2 2024-08-10 16:19:19,759 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 16:19:32,611 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-10 16:19:35,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639920.0, ans=0.1 2024-08-10 16:19:40,825 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-64000.pt 2024-08-10 16:19:45,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.869e+01 3.209e+01 3.631e+01 6.157e+01, threshold=6.418e+01, percent-clipped=0.0 2024-08-10 16:19:50,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=640020.0, ans=0.125 2024-08-10 16:19:53,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=640020.0, ans=0.0 2024-08-10 16:19:53,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=640020.0, ans=0.125 2024-08-10 16:19:53,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-10 16:19:59,405 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-10 16:20:02,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=12.0 2024-08-10 16:20:06,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=640120.0, ans=0.0 2024-08-10 16:20:15,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6050, loss[loss=0.1096, beats_loss=0.01393, ecapa_loss=0.0001692, whisper_loss=0.09397, over 18363.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01188, ecapa_loss=0.0002385, whisper_loss=0.09509, over 3881028.66 frames. ], batch size: 69, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:20:56,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640420.0, ans=0.1 2024-08-10 16:21:32,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6100, loss[loss=0.06855, beats_loss=0.01582, ecapa_loss=0.00029, whisper_loss=0.04982, over 13099.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01188, ecapa_loss=0.0002396, whisper_loss=0.09497, over 3867064.79 frames. ], batch size: 55, lr: 1.25e-02, grad_scale: 137438953472.0 2024-08-10 16:21:35,359 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-10 16:21:39,046 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 16:21:43,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=640720.0, ans=0.04949747468305833 2024-08-10 16:21:44,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=640720.0, ans=0.0 2024-08-10 16:21:47,056 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 16:21:54,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-10 16:21:55,201 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 16:22:15,582 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.283e+01 3.045e+01 3.489e+01 4.204e+01 8.442e+01, threshold=6.977e+01, percent-clipped=4.0 2024-08-10 16:22:44,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=641120.0, ans=0.0 2024-08-10 16:22:45,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=641120.0, ans=0.2 2024-08-10 16:22:48,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6150, loss[loss=0.1014, beats_loss=0.01251, ecapa_loss=0.0002516, whisper_loss=0.08637, over 23198.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01188, ecapa_loss=0.0002394, whisper_loss=0.09465, over 3885592.61 frames. ], batch size: 95, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:23:00,344 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-10 16:23:15,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-10 16:23:34,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=641520.0, ans=0.125 2024-08-10 16:23:37,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=641520.0, ans=0.2 2024-08-10 16:23:41,894 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 16:23:46,162 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 16:23:52,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=641620.0, ans=0.0 2024-08-10 16:24:02,801 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 16:24:03,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6200, loss[loss=0.123, beats_loss=0.01083, ecapa_loss=0.0002425, whisper_loss=0.1097, over 22092.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01181, ecapa_loss=0.0002408, whisper_loss=0.09587, over 3938204.70 frames. ], batch size: 88, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:24:08,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=641720.0, ans=0.025 2024-08-10 16:24:33,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=641920.0, ans=0.125 2024-08-10 16:24:39,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-10 16:24:39,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=641920.0, ans=0.05 2024-08-10 16:24:42,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.777e+01 3.185e+01 3.780e+01 9.777e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 16:24:45,622 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 16:24:58,670 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 16:25:01,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=642120.0, ans=0.125 2024-08-10 16:25:12,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-10 16:25:16,337 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6250, loss[loss=0.1219, beats_loss=0.01261, ecapa_loss=0.000238, whisper_loss=0.1069, over 15358.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01179, ecapa_loss=0.0002402, whisper_loss=0.09601, over 3952869.98 frames. ], batch size: 61, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:26:01,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-10 16:26:08,302 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 29 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-10 16:26:22,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=642620.0, ans=0.125 2024-08-10 16:26:25,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=642620.0, ans=0.04949747468305833 2024-08-10 16:26:31,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6300, loss[loss=0.1109, beats_loss=0.01047, ecapa_loss=0.0002976, whisper_loss=0.09745, over 21405.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01177, ecapa_loss=0.0002399, whisper_loss=0.09625, over 3948400.15 frames. ], batch size: 92, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:26:49,400 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 16:26:50,943 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:26:55,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=642820.0, ans=0.125 2024-08-10 16:26:59,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=642820.0, ans=0.125 2024-08-10 16:27:04,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=642920.0, ans=0.125 2024-08-10 16:27:13,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=642920.0, ans=0.2 2024-08-10 16:27:14,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.923e+01 3.266e+01 3.609e+01 6.240e+01, threshold=6.531e+01, percent-clipped=0.0 2024-08-10 16:27:20,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=643020.0, ans=0.0 2024-08-10 16:27:26,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=643020.0, ans=0.95 2024-08-10 16:27:46,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6350, loss[loss=0.108, beats_loss=0.01138, ecapa_loss=0.000302, whisper_loss=0.09357, over 14817.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01181, ecapa_loss=0.0002404, whisper_loss=0.09579, over 3946326.34 frames. ], batch size: 62, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:27:59,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643320.0, ans=0.1 2024-08-10 16:28:04,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=643320.0, ans=0.125 2024-08-10 16:28:19,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=12.0 2024-08-10 16:28:37,152 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 16:28:39,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=643520.0, ans=0.0 2024-08-10 16:28:54,214 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.173e-03 2024-08-10 16:28:57,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6400, loss[loss=0.09417, beats_loss=0.01046, ecapa_loss=0.0002312, whisper_loss=0.08139, over 21770.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01178, ecapa_loss=0.0002419, whisper_loss=0.0957, over 3939442.53 frames. ], batch size: 84, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:28:58,073 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 16:29:12,426 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-10 16:29:28,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643920.0, ans=0.1 2024-08-10 16:29:34,534 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-10 16:29:36,871 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+01 2.802e+01 3.219e+01 3.654e+01 6.592e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-10 16:29:44,349 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-10 16:29:53,696 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 16:29:59,084 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 16:30:07,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6450, loss[loss=0.1074, beats_loss=0.01201, ecapa_loss=0.0002144, whisper_loss=0.09323, over 19307.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01179, ecapa_loss=0.000242, whisper_loss=0.09584, over 3957525.51 frames. ], batch size: 73, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:30:09,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=644220.0, ans=0.04949747468305833 2024-08-10 16:30:12,374 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 16:30:20,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=644320.0, ans=0.125 2024-08-10 16:30:30,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-08-10 16:30:46,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644520.0, ans=0.1 2024-08-10 16:30:50,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=644520.0, ans=0.04949747468305833 2024-08-10 16:30:58,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644520.0, ans=0.1 2024-08-10 16:31:04,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=644620.0, ans=0.2 2024-08-10 16:31:06,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=644620.0, ans=0.125 2024-08-10 16:31:08,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=644620.0, ans=0.125 2024-08-10 16:31:14,629 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6500, loss[loss=0.1141, beats_loss=0.01304, ecapa_loss=0.000285, whisper_loss=0.09824, over 22199.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01174, ecapa_loss=0.0002417, whisper_loss=0.09678, over 3965133.40 frames. ], batch size: 92, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:31:30,129 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 16:31:53,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 3.031e+01 3.251e+01 3.712e+01 6.418e+01, threshold=6.501e+01, percent-clipped=0.0 2024-08-10 16:31:53,436 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 16:31:57,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=645020.0, ans=0.05 2024-08-10 16:32:00,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=645020.0, ans=0.07 2024-08-10 16:32:13,889 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 16:32:23,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6550, loss[loss=0.1165, beats_loss=0.01088, ecapa_loss=0.0002696, whisper_loss=0.103, over 21381.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01174, ecapa_loss=0.0002409, whisper_loss=0.09663, over 3961164.02 frames. ], batch size: 90, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:32:40,563 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 16:32:49,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=645320.0, ans=0.0 2024-08-10 16:32:50,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=645420.0, ans=0.0 2024-08-10 16:32:52,954 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 16:32:54,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2024-08-10 16:32:57,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=645420.0, ans=0.0 2024-08-10 16:33:20,206 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 16:33:27,325 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 16:33:32,494 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6600, loss[loss=0.1089, beats_loss=0.01242, ecapa_loss=0.0002147, whisper_loss=0.09437, over 22804.00 frames. ], tot_loss[loss=0.111, beats_loss=0.0117, ecapa_loss=0.0002395, whisper_loss=0.09688, over 3967480.65 frames. ], batch size: 92, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:33:35,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2024-08-10 16:34:07,041 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 16:34:11,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+01 2.864e+01 3.355e+01 3.985e+01 6.693e+01, threshold=6.710e+01, percent-clipped=1.0 2024-08-10 16:34:14,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=646020.0, ans=0.125 2024-08-10 16:34:26,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=646020.0, ans=0.0 2024-08-10 16:34:28,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=646120.0, ans=0.125 2024-08-10 16:34:28,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-08-10 16:34:31,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646120.0, ans=0.1 2024-08-10 16:34:43,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6650, loss[loss=0.09647, beats_loss=0.01477, ecapa_loss=0.0002305, whisper_loss=0.07939, over 17690.00 frames. ], tot_loss[loss=0.1113, beats_loss=0.01172, ecapa_loss=0.0002392, whisper_loss=0.09724, over 4009199.80 frames. ], batch size: 73, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:34:57,173 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 16:35:07,730 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 16:35:15,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-08-10 16:35:22,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=646420.0, ans=0.0 2024-08-10 16:35:22,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-10 16:35:35,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=646520.0, ans=0.2 2024-08-10 16:35:55,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6700, loss[loss=0.09348, beats_loss=0.01222, ecapa_loss=0.0002432, whisper_loss=0.07883, over 14313.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01173, ecapa_loss=0.0002403, whisper_loss=0.09681, over 3990594.50 frames. ], batch size: 57, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:36:04,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=646720.0, ans=0.0 2024-08-10 16:36:15,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=646820.0, ans=0.0 2024-08-10 16:36:33,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646920.0, ans=0.1 2024-08-10 16:36:34,774 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.879e+01 3.180e+01 3.709e+01 5.171e+01, threshold=6.361e+01, percent-clipped=0.0 2024-08-10 16:36:37,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=647020.0, ans=0.1 2024-08-10 16:36:37,997 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.137e+05 2024-08-10 16:36:39,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=647020.0, ans=0.2 2024-08-10 16:36:39,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=647020.0, ans=0.125 2024-08-10 16:36:52,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=647120.0, ans=0.125 2024-08-10 16:36:54,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-10 16:37:05,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6750, loss[loss=0.1143, beats_loss=0.01309, ecapa_loss=0.0002149, whisper_loss=0.09904, over 22940.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01172, ecapa_loss=0.000241, whisper_loss=0.09707, over 3979964.97 frames. ], batch size: 88, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:37:24,321 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 16:37:27,318 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 16:37:32,614 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 16:37:32,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=647420.0, ans=0.07 2024-08-10 16:37:38,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=647420.0, ans=0.0 2024-08-10 16:37:43,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=647420.0, ans=0.2 2024-08-10 16:37:56,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647520.0, ans=0.1 2024-08-10 16:37:56,463 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.771e-01 2024-08-10 16:37:56,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2024-08-10 16:38:02,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=647620.0, ans=0.125 2024-08-10 16:38:05,420 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 16:38:12,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6800, loss[loss=0.1174, beats_loss=0.01305, ecapa_loss=0.0002311, whisper_loss=0.1021, over 21799.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01172, ecapa_loss=0.0002409, whisper_loss=0.09706, over 3981345.89 frames. ], batch size: 89, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:38:13,134 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 16:38:18,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-08-10 16:38:26,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=647820.0, ans=0.2 2024-08-10 16:38:38,083 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-10 16:38:44,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=647920.0, ans=0.125 2024-08-10 16:38:49,453 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 16:38:49,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=647920.0, ans=0.125 2024-08-10 16:38:53,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.892e+01 3.322e+01 4.059e+01 7.063e+01, threshold=6.643e+01, percent-clipped=1.0 2024-08-10 16:38:59,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=648020.0, ans=0.125 2024-08-10 16:39:05,757 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 16:39:21,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.79 vs. limit=15.0 2024-08-10 16:39:21,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-10 16:39:22,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=648220.0, ans=12.0 2024-08-10 16:39:22,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6850, loss[loss=0.09864, beats_loss=0.01213, ecapa_loss=0.0001756, whisper_loss=0.08475, over 17988.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01177, ecapa_loss=0.0002389, whisper_loss=0.09681, over 3945624.98 frames. ], batch size: 64, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:39:30,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-10 16:39:38,064 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 16:39:44,524 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 16:39:55,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=648420.0, ans=0.125 2024-08-10 16:39:59,016 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 16:40:22,838 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 16:40:26,675 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 16:40:28,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-10 16:40:30,607 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-10 16:40:31,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6900, loss[loss=0.07671, beats_loss=0.01628, ecapa_loss=0.0001485, whisper_loss=0.05894, over 14596.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01178, ecapa_loss=0.0002378, whisper_loss=0.09671, over 3917774.50 frames. ], batch size: 59, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:40:41,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2024-08-10 16:40:44,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=648820.0, ans=0.95 2024-08-10 16:40:48,894 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 16:41:02,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648920.0, ans=0.125 2024-08-10 16:41:06,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=648920.0, ans=0.125 2024-08-10 16:41:10,315 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.858e+01 3.304e+01 3.695e+01 5.634e+01, threshold=6.608e+01, percent-clipped=0.0 2024-08-10 16:41:29,061 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-10 16:41:36,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=649120.0, ans=0.025 2024-08-10 16:41:39,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2024-08-10 16:41:40,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 6950, loss[loss=0.1176, beats_loss=0.01189, ecapa_loss=0.0001758, whisper_loss=0.1039, over 14985.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01181, ecapa_loss=0.0002374, whisper_loss=0.09662, over 3893937.93 frames. ], batch size: 58, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:41:45,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=649220.0, ans=0.125 2024-08-10 16:42:07,526 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 16:42:27,624 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 16:42:37,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=649620.0, ans=0.2 2024-08-10 16:42:42,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=649620.0, ans=0.0 2024-08-10 16:42:44,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=649620.0, ans=0.2 2024-08-10 16:42:48,338 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.481e+05 2024-08-10 16:42:49,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7000, loss[loss=0.1314, beats_loss=0.008144, ecapa_loss=0.0002877, whisper_loss=0.1204, over 21788.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01169, ecapa_loss=0.0002394, whisper_loss=0.09681, over 3877958.23 frames. ], batch size: 84, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:42:51,834 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 16:42:59,609 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 16:43:07,496 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-10 16:43:25,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+01 2.800e+01 3.263e+01 3.998e+01 9.402e+01, threshold=6.527e+01, percent-clipped=1.0 2024-08-10 16:43:28,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=650020.0, ans=0.0 2024-08-10 16:43:31,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2024-08-10 16:43:47,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650120.0, ans=0.125 2024-08-10 16:43:55,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7050, loss[loss=0.1044, beats_loss=0.01425, ecapa_loss=0.0002096, whisper_loss=0.08801, over 19793.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01174, ecapa_loss=0.0002395, whisper_loss=0.09677, over 3890088.02 frames. ], batch size: 79, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:44:11,730 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 16:44:54,097 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 16:44:55,478 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 16:44:57,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=650620.0, ans=0.125 2024-08-10 16:45:01,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7100, loss[loss=0.1015, beats_loss=0.0122, ecapa_loss=0.0002085, whisper_loss=0.08718, over 21717.00 frames. ], tot_loss[loss=0.1108, beats_loss=0.01168, ecapa_loss=0.0002377, whisper_loss=0.09672, over 3882113.59 frames. ], batch size: 88, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:45:09,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650720.0, ans=0.1 2024-08-10 16:45:39,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.645e+01 3.161e+01 3.535e+01 5.692e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 16:45:49,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=651020.0, ans=0.0 2024-08-10 16:46:08,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7150, loss[loss=0.1267, beats_loss=0.01024, ecapa_loss=0.0002727, whisper_loss=0.1137, over 19964.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01172, ecapa_loss=0.000238, whisper_loss=0.09617, over 3878548.36 frames. ], batch size: 80, lr: 1.24e-02, grad_scale: 137438953472.0 2024-08-10 16:46:15,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-10 16:46:52,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=651520.0, ans=0.125 2024-08-10 16:47:14,751 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 16:47:17,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7200, loss[loss=0.1215, beats_loss=0.0117, ecapa_loss=0.0002762, whisper_loss=0.1071, over 17058.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01174, ecapa_loss=0.0002375, whisper_loss=0.09623, over 3852924.33 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:47:34,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=651820.0, ans=0.125 2024-08-10 16:47:56,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.798e+01 3.257e+01 4.005e+01 1.167e+02, threshold=6.513e+01, percent-clipped=2.0 2024-08-10 16:48:16,289 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-10 16:48:26,570 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7250, loss[loss=0.1165, beats_loss=0.0108, ecapa_loss=0.0002138, whisper_loss=0.1035, over 17362.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01177, ecapa_loss=0.0002383, whisper_loss=0.09551, over 3839651.39 frames. ], batch size: 65, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:48:28,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=652220.0, ans=0.125 2024-08-10 16:48:30,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=652220.0, ans=0.125 2024-08-10 16:48:47,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2024-08-10 16:48:48,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=652320.0, ans=0.125 2024-08-10 16:48:56,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=652420.0, ans=0.125 2024-08-10 16:49:04,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=652420.0, ans=0.125 2024-08-10 16:49:19,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-10 16:49:33,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=652620.0, ans=0.1 2024-08-10 16:49:34,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2024-08-10 16:49:38,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7300, loss[loss=0.1023, beats_loss=0.01278, ecapa_loss=0.0001862, whisper_loss=0.0877, over 18413.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01181, ecapa_loss=0.0002376, whisper_loss=0.09596, over 3875131.58 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:49:56,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=652820.0, ans=0.0 2024-08-10 16:50:15,306 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 16:50:16,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.741e+01 3.048e+01 3.552e+01 4.958e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-10 16:50:23,665 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 16:50:41,816 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:50:46,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7350, loss[loss=0.1331, beats_loss=0.009517, ecapa_loss=0.0002335, whisper_loss=0.1212, over 18956.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01177, ecapa_loss=0.000239, whisper_loss=0.09614, over 3912781.21 frames. ], batch size: 73, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:50:50,860 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 16:50:55,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653220.0, ans=0.1 2024-08-10 16:51:02,226 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.759e+05 2024-08-10 16:51:06,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=653320.0, ans=0.1 2024-08-10 16:51:31,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-08-10 16:51:51,642 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 29 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 16:51:57,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7400, loss[loss=0.08975, beats_loss=0.01427, ecapa_loss=0.0001553, whisper_loss=0.07393, over 17847.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.0117, ecapa_loss=0.0002388, whisper_loss=0.09565, over 3879712.74 frames. ], batch size: 69, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:52:00,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=653720.0, ans=0.125 2024-08-10 16:52:01,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=653720.0, ans=0.2 2024-08-10 16:52:07,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=653720.0, ans=0.125 2024-08-10 16:52:08,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=653720.0, ans=0.07 2024-08-10 16:52:15,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=653820.0, ans=0.125 2024-08-10 16:52:23,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-10 16:52:31,569 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-10 16:52:35,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.867e+01 3.212e+01 3.714e+01 5.750e+01, threshold=6.424e+01, percent-clipped=0.0 2024-08-10 16:52:45,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=654020.0, ans=0.125 2024-08-10 16:52:50,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-08-10 16:53:04,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7450, loss[loss=0.1274, beats_loss=0.01226, ecapa_loss=0.0002176, whisper_loss=0.113, over 23329.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01178, ecapa_loss=0.0002365, whisper_loss=0.09545, over 3918506.61 frames. ], batch size: 92, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:53:07,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=654220.0, ans=0.125 2024-08-10 16:53:08,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654220.0, ans=0.1 2024-08-10 16:53:10,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-10 16:53:22,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=654320.0, ans=0.0 2024-08-10 16:53:30,145 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-10 16:53:31,458 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 16:53:34,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=654420.0, ans=15.0 2024-08-10 16:53:35,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-10 16:53:36,585 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 16:53:41,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=654420.0, ans=10.0 2024-08-10 16:53:54,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=654520.0, ans=0.0 2024-08-10 16:54:03,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654620.0, ans=0.1 2024-08-10 16:54:11,077 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7500, loss[loss=0.1044, beats_loss=0.01315, ecapa_loss=0.000177, whisper_loss=0.08949, over 17215.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01167, ecapa_loss=0.0002376, whisper_loss=0.09597, over 3892704.10 frames. ], batch size: 66, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:54:15,016 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 16:54:24,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=654820.0, ans=0.0 2024-08-10 16:54:29,072 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:54:35,216 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 16:54:37,724 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 16:54:43,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654920.0, ans=0.1 2024-08-10 16:54:48,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.920e+01 3.227e+01 3.863e+01 6.212e+01, threshold=6.454e+01, percent-clipped=0.0 2024-08-10 16:54:55,179 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 32 from LS+wenet, 12 from Vox, 15 fro AS 2024-08-10 16:55:04,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=655120.0, ans=0.125 2024-08-10 16:55:17,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7550, loss[loss=0.1156, beats_loss=0.01027, ecapa_loss=0.0002417, whisper_loss=0.1029, over 17130.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01168, ecapa_loss=0.0002382, whisper_loss=0.09548, over 3842625.81 frames. ], batch size: 66, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:55:32,360 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-10 16:55:47,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=655420.0, ans=0.125 2024-08-10 16:55:48,092 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 16:55:51,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-08-10 16:56:07,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2024-08-10 16:56:18,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=655620.0, ans=0.0 2024-08-10 16:56:20,832 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 16:56:24,234 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7600, loss[loss=0.08761, beats_loss=0.01293, ecapa_loss=0.0002515, whisper_loss=0.07216, over 15912.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01168, ecapa_loss=0.000239, whisper_loss=0.09577, over 3853630.56 frames. ], batch size: 65, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:56:35,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=655720.0, ans=0.0 2024-08-10 16:56:42,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=655820.0, ans=0.0 2024-08-10 16:56:47,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=655820.0, ans=0.125 2024-08-10 16:57:02,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.834e+01 3.413e+01 3.883e+01 8.700e+01, threshold=6.826e+01, percent-clipped=1.0 2024-08-10 16:57:06,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=656020.0, ans=0.125 2024-08-10 16:57:14,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=656020.0, ans=0.0 2024-08-10 16:57:15,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=656020.0, ans=0.125 2024-08-10 16:57:20,293 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 16:57:23,183 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-10 16:57:31,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7650, loss[loss=0.1203, beats_loss=0.009781, ecapa_loss=0.000237, whisper_loss=0.1082, over 20295.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01164, ecapa_loss=0.0002394, whisper_loss=0.09484, over 3816297.91 frames. ], batch size: 77, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:57:33,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=656220.0, ans=0.0 2024-08-10 16:57:53,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=656320.0, ans=0.125 2024-08-10 16:57:59,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=656420.0, ans=0.125 2024-08-10 16:58:32,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656620.0, ans=0.1 2024-08-10 16:58:37,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7700, loss[loss=0.122, beats_loss=0.01139, ecapa_loss=0.0002423, whisper_loss=0.1082, over 17905.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01162, ecapa_loss=0.00024, whisper_loss=0.09534, over 3859071.06 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:58:40,041 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 16:58:41,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2024-08-10 16:58:52,903 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-10 16:58:53,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656820.0, ans=0.1 2024-08-10 16:58:53,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.73 vs. limit=22.5 2024-08-10 16:58:56,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-10 16:58:58,478 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 16:59:15,230 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.911e+01 3.362e+01 3.849e+01 6.405e+01, threshold=6.723e+01, percent-clipped=0.0 2024-08-10 16:59:15,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656920.0, ans=0.1 2024-08-10 16:59:22,992 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 18 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-10 16:59:27,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657020.0, ans=0.1 2024-08-10 16:59:29,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=657120.0, ans=0.0 2024-08-10 16:59:43,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=657220.0, ans=0.0 2024-08-10 16:59:44,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7750, loss[loss=0.1134, beats_loss=0.01272, ecapa_loss=0.0002083, whisper_loss=0.09863, over 23127.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01162, ecapa_loss=0.0002403, whisper_loss=0.0956, over 3858388.03 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 16:59:51,394 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 16:59:56,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657220.0, ans=0.1 2024-08-10 16:59:57,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=657320.0, ans=0.125 2024-08-10 16:59:58,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=657320.0, ans=0.2 2024-08-10 17:00:06,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=657320.0, ans=0.125 2024-08-10 17:00:09,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=657320.0, ans=0.125 2024-08-10 17:00:10,371 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 17:00:39,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=657620.0, ans=0.0 2024-08-10 17:00:51,425 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7800, loss[loss=0.1044, beats_loss=0.01391, ecapa_loss=0.0002024, whisper_loss=0.08844, over 22688.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01165, ecapa_loss=0.0002388, whisper_loss=0.09609, over 3867661.61 frames. ], batch size: 91, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:00:51,837 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.174e-02 2024-08-10 17:01:07,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2024-08-10 17:01:12,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=657820.0, ans=0.125 2024-08-10 17:01:16,607 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 17:01:21,021 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 17:01:28,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.784e+01 3.058e+01 3.552e+01 6.431e+01, threshold=6.115e+01, percent-clipped=0.0 2024-08-10 17:01:29,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2024-08-10 17:01:36,378 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 17:01:37,866 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 17:01:57,337 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7850, loss[loss=0.09117, beats_loss=0.01264, ecapa_loss=0.0002596, whisper_loss=0.07594, over 20514.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01173, ecapa_loss=0.0002379, whisper_loss=0.09619, over 3898727.91 frames. ], batch size: 86, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:02:03,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=658220.0, ans=0.2 2024-08-10 17:02:14,767 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 17:02:38,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=658520.0, ans=0.0 2024-08-10 17:02:45,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658520.0, ans=0.1 2024-08-10 17:02:45,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2024-08-10 17:02:50,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=658620.0, ans=0.2 2024-08-10 17:03:04,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7900, loss[loss=0.12, beats_loss=0.01092, ecapa_loss=0.000252, whisper_loss=0.1066, over 22853.00 frames. ], tot_loss[loss=0.111, beats_loss=0.01174, ecapa_loss=0.0002376, whisper_loss=0.09687, over 3890761.93 frames. ], batch size: 93, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:03:13,645 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 17:03:19,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=658820.0, ans=0.125 2024-08-10 17:03:23,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=658820.0, ans=0.2 2024-08-10 17:03:29,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=658820.0, ans=0.0 2024-08-10 17:03:42,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.841e+01 3.204e+01 3.801e+01 5.785e+01, threshold=6.407e+01, percent-clipped=0.0 2024-08-10 17:04:01,030 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 17:04:12,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 7950, loss[loss=0.1289, beats_loss=0.008721, ecapa_loss=0.0002361, whisper_loss=0.1178, over 16015.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01173, ecapa_loss=0.0002377, whisper_loss=0.09676, over 3874286.21 frames. ], batch size: 61, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:04:12,688 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 17:04:19,594 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 17:04:25,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2024-08-10 17:04:26,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=659320.0, ans=0.0 2024-08-10 17:04:36,660 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 17:04:37,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=659320.0, ans=0.05 2024-08-10 17:04:50,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659420.0, ans=0.1 2024-08-10 17:05:02,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=659520.0, ans=0.125 2024-08-10 17:05:08,152 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 17:05:13,430 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 17:05:13,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=659620.0, ans=0.125 2024-08-10 17:05:19,001 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8000, loss[loss=0.09419, beats_loss=0.01451, ecapa_loss=0.0002014, whisper_loss=0.07767, over 17277.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01178, ecapa_loss=0.0002369, whisper_loss=0.09608, over 3869334.99 frames. ], batch size: 69, lr: 1.23e-02, grad_scale: 137438953472.0 2024-08-10 17:05:27,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.32 vs. limit=15.0 2024-08-10 17:05:30,672 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 17:05:40,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=659820.0, ans=0.2 2024-08-10 17:05:45,579 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 17:05:55,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.761e+01 3.157e+01 3.536e+01 5.933e+01, threshold=6.314e+01, percent-clipped=0.0 2024-08-10 17:06:09,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=660020.0, ans=0.125 2024-08-10 17:06:24,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=660220.0, ans=0.2 2024-08-10 17:06:25,376 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8050, loss[loss=0.1189, beats_loss=0.009828, ecapa_loss=0.000313, whisper_loss=0.106, over 21055.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01181, ecapa_loss=0.0002376, whisper_loss=0.09591, over 3872041.64 frames. ], batch size: 89, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:06:31,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=660220.0, ans=0.0 2024-08-10 17:06:47,119 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 17:06:50,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-10 17:06:58,836 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 17:07:15,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=660520.0, ans=0.0 2024-08-10 17:07:19,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=660620.0, ans=0.125 2024-08-10 17:07:29,904 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 17:07:30,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=660620.0, ans=0.0 2024-08-10 17:07:32,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8100, loss[loss=0.1097, beats_loss=0.01001, ecapa_loss=0.0002799, whisper_loss=0.09689, over 20387.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01182, ecapa_loss=0.0002377, whisper_loss=0.09504, over 3861684.25 frames. ], batch size: 81, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:07:34,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=660720.0, ans=0.125 2024-08-10 17:07:54,840 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 17:07:57,420 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 17:08:07,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=660920.0, ans=0.0 2024-08-10 17:08:09,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.280e+01 3.015e+01 3.254e+01 3.867e+01 1.141e+02, threshold=6.509e+01, percent-clipped=2.0 2024-08-10 17:08:14,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=661020.0, ans=0.125 2024-08-10 17:08:15,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-08-10 17:08:23,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661020.0, ans=0.1 2024-08-10 17:08:23,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-10 17:08:26,810 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-10 17:08:30,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=661120.0, ans=0.0 2024-08-10 17:08:32,414 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-10 17:08:38,797 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8150, loss[loss=0.1033, beats_loss=0.01127, ecapa_loss=0.0002048, whisper_loss=0.08999, over 16186.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.0117, ecapa_loss=0.0002379, whisper_loss=0.0962, over 3869811.21 frames. ], batch size: 60, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:09:07,357 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 17:09:15,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=661420.0, ans=0.5 2024-08-10 17:09:20,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=661520.0, ans=0.125 2024-08-10 17:09:26,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=661520.0, ans=0.125 2024-08-10 17:09:26,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-10 17:09:39,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=661620.0, ans=0.0 2024-08-10 17:09:39,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=661620.0, ans=0.2 2024-08-10 17:09:45,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8200, loss[loss=0.1224, beats_loss=0.009804, ecapa_loss=0.0002258, whisper_loss=0.1104, over 18196.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01171, ecapa_loss=0.0002379, whisper_loss=0.09618, over 3897025.72 frames. ], batch size: 70, lr: 1.23e-02, grad_scale: 274877906944.0 2024-08-10 17:09:48,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=661720.0, ans=0.0 2024-08-10 17:09:49,349 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 17:09:54,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=661720.0, ans=0.0 2024-08-10 17:10:02,489 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 17:10:04,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=661820.0, ans=0.125 2024-08-10 17:10:05,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661820.0, ans=0.0 2024-08-10 17:10:07,682 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 17:10:15,423 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 17:10:21,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.877e+01 3.347e+01 3.681e+01 6.491e+01, threshold=6.694e+01, percent-clipped=0.0 2024-08-10 17:10:29,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.33 vs. limit=15.0 2024-08-10 17:10:32,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=662020.0, ans=0.0 2024-08-10 17:10:45,721 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 17:10:47,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662120.0, ans=0.1 2024-08-10 17:10:50,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8250, loss[loss=0.1317, beats_loss=0.01016, ecapa_loss=0.0002176, whisper_loss=0.1194, over 23404.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01172, ecapa_loss=0.0002377, whisper_loss=0.09609, over 3915806.31 frames. ], batch size: 88, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:10:54,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=662220.0, ans=0.125 2024-08-10 17:11:09,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=662320.0, ans=0.025 2024-08-10 17:11:12,587 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 17:11:19,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=662420.0, ans=0.125 2024-08-10 17:11:32,648 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 17:11:32,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=662520.0, ans=0.1 2024-08-10 17:11:39,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=12.0 2024-08-10 17:11:41,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=662520.0, ans=0.2 2024-08-10 17:11:52,417 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-10 17:11:57,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8300, loss[loss=0.1197, beats_loss=0.01071, ecapa_loss=0.0002897, whisper_loss=0.1061, over 21812.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01168, ecapa_loss=0.000237, whisper_loss=0.09638, over 3905984.14 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:12:03,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=662720.0, ans=0.0 2024-08-10 17:12:15,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662820.0, ans=0.1 2024-08-10 17:12:19,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-10 17:12:24,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=662920.0, ans=0.0 2024-08-10 17:12:34,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+01 2.908e+01 3.363e+01 4.143e+01 6.461e+01, threshold=6.726e+01, percent-clipped=0.0 2024-08-10 17:12:49,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-08-10 17:13:04,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8350, loss[loss=0.1224, beats_loss=0.01192, ecapa_loss=0.0002527, whisper_loss=0.108, over 23163.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01173, ecapa_loss=0.0002363, whisper_loss=0.09637, over 3903950.24 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:13:06,166 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-10 17:13:08,926 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-10 17:13:10,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-10 17:13:17,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=663320.0, ans=0.125 2024-08-10 17:13:19,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=663320.0, ans=0.2 2024-08-10 17:13:30,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=663320.0, ans=22.5 2024-08-10 17:13:43,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=663420.0, ans=0.0 2024-08-10 17:13:56,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=663520.0, ans=10.0 2024-08-10 17:14:15,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8400, loss[loss=0.1083, beats_loss=0.01373, ecapa_loss=0.0002628, whisper_loss=0.09198, over 21346.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01174, ecapa_loss=0.0002381, whisper_loss=0.09693, over 3945494.80 frames. ], batch size: 88, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:14:20,741 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-10 17:14:20,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=663720.0, ans=0.2 2024-08-10 17:14:26,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=663720.0, ans=0.0 2024-08-10 17:14:32,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-08-10 17:14:47,654 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-10 17:14:56,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.836e+01 3.172e+01 3.671e+01 5.154e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-10 17:15:14,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=664120.0, ans=0.125 2024-08-10 17:15:23,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-08-10 17:15:24,904 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 17:15:29,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8450, loss[loss=0.1161, beats_loss=0.01468, ecapa_loss=0.0001909, whisper_loss=0.09951, over 23417.00 frames. ], tot_loss[loss=0.1116, beats_loss=0.01165, ecapa_loss=0.0002393, whisper_loss=0.09757, over 3915394.09 frames. ], batch size: 91, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:15:44,745 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 17:15:52,308 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-10 17:15:55,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=664320.0, ans=0.125 2024-08-10 17:16:01,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-08-10 17:16:02,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2024-08-10 17:16:07,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2024-08-10 17:16:15,207 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 15 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 17:16:19,929 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 17:16:32,704 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-10 17:16:34,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664620.0, ans=0.1 2024-08-10 17:16:38,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=664620.0, ans=0.0 2024-08-10 17:16:41,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=664720.0, ans=0.0 2024-08-10 17:16:42,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8500, loss[loss=0.08938, beats_loss=0.01443, ecapa_loss=0.0002454, whisper_loss=0.07249, over 20689.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01165, ecapa_loss=0.0002394, whisper_loss=0.09681, over 3905352.95 frames. ], batch size: 86, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:16:47,545 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 17:16:54,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=22.5 2024-08-10 17:17:03,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2024-08-10 17:17:06,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-10 17:17:26,712 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.856e+01 3.264e+01 3.786e+01 7.141e+01, threshold=6.528e+01, percent-clipped=1.0 2024-08-10 17:17:33,901 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 17:17:40,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=665020.0, ans=0.125 2024-08-10 17:17:51,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=665120.0, ans=0.125 2024-08-10 17:18:00,056 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8550, loss[loss=0.1029, beats_loss=0.01456, ecapa_loss=0.0002486, whisper_loss=0.08581, over 21797.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01161, ecapa_loss=0.0002401, whisper_loss=0.09751, over 3924738.50 frames. ], batch size: 92, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:18:14,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=665320.0, ans=0.0 2024-08-10 17:18:14,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=665320.0, ans=0.07 2024-08-10 17:18:19,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=665320.0, ans=0.2 2024-08-10 17:18:22,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=665320.0, ans=0.1 2024-08-10 17:18:24,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=665320.0, ans=0.2 2024-08-10 17:18:29,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=665420.0, ans=0.2 2024-08-10 17:18:44,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665520.0, ans=0.1 2024-08-10 17:18:44,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=665520.0, ans=0.125 2024-08-10 17:18:55,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-10 17:18:58,486 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 17:19:12,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=665620.0, ans=0.0 2024-08-10 17:19:15,293 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 17:19:16,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8600, loss[loss=0.09756, beats_loss=0.01391, ecapa_loss=0.0002395, whisper_loss=0.08126, over 21998.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01167, ecapa_loss=0.0002395, whisper_loss=0.097, over 3910042.43 frames. ], batch size: 91, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:19:18,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665720.0, ans=0.1 2024-08-10 17:19:19,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665720.0, ans=0.1 2024-08-10 17:19:28,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=12.0 2024-08-10 17:19:33,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=665820.0, ans=0.125 2024-08-10 17:19:46,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=665820.0, ans=0.2 2024-08-10 17:20:05,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.897e+01 3.260e+01 3.635e+01 5.528e+01, threshold=6.520e+01, percent-clipped=0.0 2024-08-10 17:20:20,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=666020.0, ans=0.125 2024-08-10 17:20:20,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=666020.0, ans=0.025 2024-08-10 17:20:24,379 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 17:20:26,966 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 17:20:41,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8650, loss[loss=0.1242, beats_loss=0.01259, ecapa_loss=0.0002034, whisper_loss=0.1096, over 21965.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01177, ecapa_loss=0.0002384, whisper_loss=0.09624, over 3869241.47 frames. ], batch size: 88, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:20:46,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=666220.0, ans=0.0 2024-08-10 17:20:50,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=666220.0, ans=0.0 2024-08-10 17:20:53,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-10 17:20:55,647 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-10 17:21:26,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=666420.0, ans=0.0 2024-08-10 17:21:43,917 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 17:21:53,354 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 17:22:14,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8700, loss[loss=0.1125, beats_loss=0.01228, ecapa_loss=0.0001866, whisper_loss=0.09832, over 23066.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01181, ecapa_loss=0.0002386, whisper_loss=0.09567, over 3872696.10 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:22:47,113 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-10 17:23:08,216 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 17:23:15,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=666920.0, ans=0.2 2024-08-10 17:23:16,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.246e+01 2.997e+01 3.503e+01 4.044e+01 1.535e+02, threshold=7.007e+01, percent-clipped=1.0 2024-08-10 17:23:36,569 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-10 17:23:58,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8750, loss[loss=0.0861, beats_loss=0.01383, ecapa_loss=0.0001884, whisper_loss=0.07038, over 18704.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01174, ecapa_loss=0.0002411, whisper_loss=0.09562, over 3875690.23 frames. ], batch size: 76, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:24:26,294 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 17:24:40,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-10 17:24:42,155 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-10 17:24:42,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2024-08-10 17:25:08,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=667520.0, ans=0.0 2024-08-10 17:25:48,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=667620.0, ans=15.0 2024-08-10 17:25:52,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=667620.0, ans=0.0 2024-08-10 17:25:57,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8800, loss[loss=0.1056, beats_loss=0.01181, ecapa_loss=0.0002234, whisper_loss=0.09153, over 19074.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.0118, ecapa_loss=0.0002389, whisper_loss=0.09533, over 3865856.48 frames. ], batch size: 77, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:26:57,018 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 17:27:04,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.842e+01 3.119e+01 3.570e+01 8.103e+01, threshold=6.239e+01, percent-clipped=1.0 2024-08-10 17:27:25,674 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 31 from Vox, 17 fro AS 2024-08-10 17:28:03,286 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8850, loss[loss=0.1071, beats_loss=0.0109, ecapa_loss=0.0002921, whisper_loss=0.0933, over 19760.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01185, ecapa_loss=0.0002375, whisper_loss=0.09512, over 3850293.45 frames. ], batch size: 83, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:28:07,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=668220.0, ans=0.07 2024-08-10 17:28:25,610 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 17:28:33,502 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-10 17:28:43,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.30 vs. limit=22.5 2024-08-10 17:28:51,257 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 17:29:25,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=668520.0, ans=0.2 2024-08-10 17:29:53,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8900, loss[loss=0.09392, beats_loss=0.009459, ecapa_loss=0.0002521, whisper_loss=0.08194, over 14391.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01189, ecapa_loss=0.0002381, whisper_loss=0.09429, over 3816226.79 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:30:00,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=668720.0, ans=0.125 2024-08-10 17:30:33,365 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-10 17:30:36,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.696e+01 3.082e+01 3.587e+01 7.840e+01, threshold=6.164e+01, percent-clipped=1.0 2024-08-10 17:30:39,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.70 vs. limit=6.0 2024-08-10 17:30:41,161 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 17:31:00,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=669120.0, ans=0.125 2024-08-10 17:31:11,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 8950, loss[loss=0.1138, beats_loss=0.01132, ecapa_loss=0.0002448, whisper_loss=0.09998, over 22751.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01193, ecapa_loss=0.0002363, whisper_loss=0.09431, over 3857349.10 frames. ], batch size: 88, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:31:11,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669220.0, ans=0.125 2024-08-10 17:31:25,547 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 17:31:35,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669320.0, ans=0.1 2024-08-10 17:32:28,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9000, loss[loss=0.1105, beats_loss=0.01248, ecapa_loss=0.0002346, whisper_loss=0.09565, over 15707.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01191, ecapa_loss=0.0002362, whisper_loss=0.09421, over 3847187.38 frames. ], batch size: 62, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:32:28,710 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 17:32:40,072 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6575, 4.5285, 3.9294, 3.8142], device='cuda:0') 2024-08-10 17:33:04,134 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on ASR_libri: loss=0.2625, beats_loss=0, ecapa_loss=0.0007367, whisper_loss=0.2551, over 922467.00 frames. 2024-08-10 17:33:20,359 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on SV_voxceleb1: loss=0.006282, beats_loss=0, ecapa_loss=0.0006282, whisper_loss=0, over 939242.00 frames. 2024-08-10 17:34:01,954 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5194, 1.6565, 1.5406, 1.1866], device='cuda:0') 2024-08-10 17:35:05,156 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on AT_audioset: loss=0.02673, beats_loss=0.02673, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 17:35:05,160 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 17:35:38,371 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 17:35:40,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.65 vs. limit=10.0 2024-08-10 17:35:47,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.797e+01 3.113e+01 3.593e+01 8.640e+01, threshold=6.226e+01, percent-clipped=2.0 2024-08-10 17:35:52,794 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-10 17:36:03,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670020.0, ans=0.1 2024-08-10 17:36:20,116 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 17:36:21,164 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9050, loss[loss=0.09657, beats_loss=0.01179, ecapa_loss=0.0002555, whisper_loss=0.08222, over 22917.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01182, ecapa_loss=0.0002376, whisper_loss=0.09561, over 3859044.86 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:36:34,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=12.0 2024-08-10 17:36:37,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-10 17:36:38,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2024-08-10 17:36:41,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=670320.0, ans=0.125 2024-08-10 17:36:43,925 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 17:36:57,546 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 17:37:00,201 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 30 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-10 17:37:24,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=670620.0, ans=0.125 2024-08-10 17:37:34,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=670720.0, ans=0.07 2024-08-10 17:37:35,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9100, loss[loss=0.1295, beats_loss=0.009686, ecapa_loss=0.0002323, whisper_loss=0.1175, over 23997.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01163, ecapa_loss=0.0002392, whisper_loss=0.09635, over 3810641.31 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:37:49,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=670820.0, ans=0.0 2024-08-10 17:37:58,073 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-10 17:38:04,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670920.0, ans=0.1 2024-08-10 17:38:15,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=670920.0, ans=0.125 2024-08-10 17:38:16,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.910e+01 3.253e+01 3.723e+01 6.048e+01, threshold=6.507e+01, percent-clipped=0.0 2024-08-10 17:38:33,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=671120.0, ans=0.125 2024-08-10 17:38:38,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=671120.0, ans=0.125 2024-08-10 17:38:49,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9150, loss[loss=0.1246, beats_loss=0.0115, ecapa_loss=0.0002321, whisper_loss=0.1108, over 21440.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01174, ecapa_loss=0.0002356, whisper_loss=0.09617, over 3837463.62 frames. ], batch size: 84, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:38:49,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=671220.0, ans=0.125 2024-08-10 17:39:06,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2024-08-10 17:39:18,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2024-08-10 17:39:38,170 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 17:39:41,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671520.0, ans=0.1 2024-08-10 17:40:07,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-08-10 17:40:09,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9200, loss[loss=0.09341, beats_loss=0.01352, ecapa_loss=0.0002498, whisper_loss=0.07739, over 21674.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01187, ecapa_loss=0.0002348, whisper_loss=0.09538, over 3878171.71 frames. ], batch size: 91, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:40:14,203 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.865e-01 2024-08-10 17:40:22,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=671720.0, ans=0.0 2024-08-10 17:40:23,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=671720.0, ans=0.2 2024-08-10 17:40:26,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2024-08-10 17:40:39,139 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 17:40:39,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=671820.0, ans=0.125 2024-08-10 17:40:42,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=671920.0, ans=0.125 2024-08-10 17:40:50,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=671920.0, ans=0.0 2024-08-10 17:40:53,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.732e+01 3.100e+01 3.483e+01 6.432e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 17:40:53,634 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-10 17:40:54,843 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 17:40:55,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2024-08-10 17:41:07,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=672020.0, ans=0.125 2024-08-10 17:41:11,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2024-08-10 17:41:27,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9250, loss[loss=0.1018, beats_loss=0.01182, ecapa_loss=0.0001923, whisper_loss=0.08803, over 17884.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01177, ecapa_loss=0.0002363, whisper_loss=0.09523, over 3868026.14 frames. ], batch size: 66, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:41:36,195 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-10 17:41:49,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=12.0 2024-08-10 17:41:49,654 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 17:41:59,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=672420.0, ans=0.0 2024-08-10 17:42:10,146 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-10 17:42:22,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.03 vs. limit=15.0 2024-08-10 17:42:41,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=672720.0, ans=0.0 2024-08-10 17:42:42,569 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9300, loss[loss=0.1233, beats_loss=0.01204, ecapa_loss=0.0002033, whisper_loss=0.1092, over 18837.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01186, ecapa_loss=0.0002345, whisper_loss=0.09499, over 3874010.34 frames. ], batch size: 73, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:42:46,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=672720.0, ans=0.2 2024-08-10 17:42:46,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=672720.0, ans=0.0 2024-08-10 17:42:47,487 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 17:42:57,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=672820.0, ans=0.125 2024-08-10 17:43:09,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=672820.0, ans=0.125 2024-08-10 17:43:28,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.985e+01 3.331e+01 3.923e+01 7.099e+01, threshold=6.662e+01, percent-clipped=2.0 2024-08-10 17:43:30,189 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.370e-01 2024-08-10 17:43:35,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=673020.0, ans=0.125 2024-08-10 17:43:40,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=673020.0, ans=0.2 2024-08-10 17:43:48,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2024-08-10 17:44:05,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9350, loss[loss=0.1043, beats_loss=0.01157, ecapa_loss=0.0002473, whisper_loss=0.09029, over 20934.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01184, ecapa_loss=0.0002357, whisper_loss=0.09482, over 3865277.06 frames. ], batch size: 87, lr: 1.22e-02, grad_scale: 274877906944.0 2024-08-10 17:44:28,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=673320.0, ans=0.1 2024-08-10 17:44:31,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=673320.0, ans=0.125 2024-08-10 17:44:36,056 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 17:44:37,318 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-10 17:44:50,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-10 17:44:53,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=673520.0, ans=0.09899494936611666 2024-08-10 17:44:56,526 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-10 17:44:57,857 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 17:44:58,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=673520.0, ans=0.125 2024-08-10 17:44:59,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673520.0, ans=0.1 2024-08-10 17:45:08,778 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-10 17:45:13,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=673620.0, ans=0.0 2024-08-10 17:45:22,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9400, loss[loss=0.1108, beats_loss=0.0136, ecapa_loss=0.0002286, whisper_loss=0.09487, over 22299.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01191, ecapa_loss=0.0002335, whisper_loss=0.09493, over 3892974.47 frames. ], batch size: 90, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:45:28,902 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-10 17:45:29,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-10 17:45:33,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=673720.0, ans=0.125 2024-08-10 17:45:42,699 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 17:46:02,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=673920.0, ans=0.0 2024-08-10 17:46:05,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.795e+01 3.115e+01 3.725e+01 7.083e+01, threshold=6.231e+01, percent-clipped=1.0 2024-08-10 17:46:05,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=673920.0, ans=0.125 2024-08-10 17:46:14,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=674020.0, ans=0.0 2024-08-10 17:46:17,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=674020.0, ans=0.125 2024-08-10 17:46:27,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-08-10 17:46:34,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=674120.0, ans=0.0 2024-08-10 17:46:36,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9450, loss[loss=0.1022, beats_loss=0.01354, ecapa_loss=0.0002289, whisper_loss=0.08639, over 20083.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01195, ecapa_loss=0.0002346, whisper_loss=0.09421, over 3888083.79 frames. ], batch size: 80, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:46:55,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=674320.0, ans=0.125 2024-08-10 17:46:58,837 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 17:47:41,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=674620.0, ans=0.0 2024-08-10 17:47:44,648 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 17 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 17:47:48,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9500, loss[loss=0.1197, beats_loss=0.01347, ecapa_loss=0.0002042, whisper_loss=0.1042, over 22141.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01197, ecapa_loss=0.0002361, whisper_loss=0.09415, over 3917644.53 frames. ], batch size: 87, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:47:54,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=674720.0, ans=0.0 2024-08-10 17:48:09,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674820.0, ans=0.0 2024-08-10 17:48:13,837 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 17:48:15,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=674820.0, ans=0.1 2024-08-10 17:48:33,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.877e+01 3.250e+01 3.723e+01 7.953e+01, threshold=6.499e+01, percent-clipped=3.0 2024-08-10 17:48:38,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=675020.0, ans=0.0 2024-08-10 17:48:46,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675020.0, ans=0.1 2024-08-10 17:48:53,632 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 17:49:05,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9550, loss[loss=0.1241, beats_loss=0.01034, ecapa_loss=0.0002177, whisper_loss=0.1116, over 23377.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01189, ecapa_loss=0.0002362, whisper_loss=0.09438, over 3889370.92 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:49:05,694 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 17:49:46,454 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 17:49:51,119 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 17:49:52,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=675520.0, ans=0.1 2024-08-10 17:49:52,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=675520.0, ans=0.07 2024-08-10 17:49:56,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-10 17:50:17,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=22.5 2024-08-10 17:50:21,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9600, loss[loss=0.1058, beats_loss=0.01215, ecapa_loss=0.0002347, whisper_loss=0.09134, over 14576.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01182, ecapa_loss=0.000237, whisper_loss=0.09422, over 3851889.37 frames. ], batch size: 57, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:50:40,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=675820.0, ans=0.05 2024-08-10 17:50:46,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=675820.0, ans=0.125 2024-08-10 17:50:48,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=675820.0, ans=0.125 2024-08-10 17:50:48,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2024-08-10 17:50:53,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=675920.0, ans=0.0 2024-08-10 17:50:58,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=675920.0, ans=0.2 2024-08-10 17:51:02,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.698e+01 2.997e+01 3.348e+01 4.884e+01, threshold=5.995e+01, percent-clipped=0.0 2024-08-10 17:51:19,799 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 17:51:27,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.09 vs. limit=15.0 2024-08-10 17:51:32,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9650, loss[loss=0.1159, beats_loss=0.01179, ecapa_loss=0.0001847, whisper_loss=0.1023, over 17043.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01175, ecapa_loss=0.000235, whisper_loss=0.09482, over 3827098.83 frames. ], batch size: 63, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:51:32,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=676220.0, ans=0.0 2024-08-10 17:51:37,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=676220.0, ans=0.0 2024-08-10 17:51:37,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=676220.0, ans=0.0 2024-08-10 17:51:53,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=676320.0, ans=0.2 2024-08-10 17:52:14,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=676520.0, ans=0.125 2024-08-10 17:52:29,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=676620.0, ans=0.0 2024-08-10 17:52:39,787 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 17:52:41,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676620.0, ans=0.1 2024-08-10 17:52:44,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=676720.0, ans=0.0 2024-08-10 17:52:45,190 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9700, loss[loss=0.09938, beats_loss=0.01174, ecapa_loss=0.0002346, whisper_loss=0.0853, over 21104.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01182, ecapa_loss=0.0002353, whisper_loss=0.09437, over 3838860.09 frames. ], batch size: 88, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:52:46,978 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 17:52:54,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=676720.0, ans=0.2 2024-08-10 17:52:57,001 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 17:53:04,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=676820.0, ans=0.1 2024-08-10 17:53:10,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=676820.0, ans=0.2 2024-08-10 17:53:14,291 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 17:53:24,171 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 17:53:26,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=676920.0, ans=0.1 2024-08-10 17:53:27,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.312e+01 2.851e+01 3.065e+01 3.509e+01 5.015e+01, threshold=6.131e+01, percent-clipped=0.0 2024-08-10 17:53:28,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2024-08-10 17:53:37,773 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-10 17:53:48,359 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 17:53:59,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9750, loss[loss=0.09318, beats_loss=0.01353, ecapa_loss=0.0001968, whisper_loss=0.07768, over 19043.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01181, ecapa_loss=0.000236, whisper_loss=0.0939, over 3848232.73 frames. ], batch size: 75, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:54:06,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677220.0, ans=0.1 2024-08-10 17:54:14,409 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-10 17:54:17,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-10 17:54:18,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=677320.0, ans=0.125 2024-08-10 17:54:31,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=677420.0, ans=0.0 2024-08-10 17:54:44,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677520.0, ans=0.1 2024-08-10 17:54:50,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=12.0 2024-08-10 17:54:54,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-10 17:54:57,155 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 17:55:12,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9800, loss[loss=0.1119, beats_loss=0.009178, ecapa_loss=0.0002377, whisper_loss=0.1003, over 16387.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01182, ecapa_loss=0.0002347, whisper_loss=0.09365, over 3880526.80 frames. ], batch size: 62, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:55:13,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677720.0, ans=0.1 2024-08-10 17:55:22,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=677720.0, ans=0.0 2024-08-10 17:55:29,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=677820.0, ans=0.0 2024-08-10 17:55:32,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-08-10 17:55:33,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=677820.0, ans=0.04949747468305833 2024-08-10 17:55:43,952 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-10 17:55:54,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.700e+01 3.065e+01 3.596e+01 6.450e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-10 17:56:08,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=678020.0, ans=0.04949747468305833 2024-08-10 17:56:19,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678120.0, ans=0.0 2024-08-10 17:56:23,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=678120.0, ans=0.125 2024-08-10 17:56:26,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9850, loss[loss=0.1214, beats_loss=0.0113, ecapa_loss=0.0002489, whisper_loss=0.1076, over 23351.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01174, ecapa_loss=0.0002355, whisper_loss=0.09455, over 3870631.37 frames. ], batch size: 93, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:56:36,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=678220.0, ans=0.125 2024-08-10 17:56:58,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=678420.0, ans=0.1 2024-08-10 17:57:02,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=678420.0, ans=0.1 2024-08-10 17:57:18,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-10 17:57:31,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=678620.0, ans=0.0 2024-08-10 17:57:36,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=678620.0, ans=0.125 2024-08-10 17:57:41,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9900, loss[loss=0.122, beats_loss=0.009887, ecapa_loss=0.0002582, whisper_loss=0.1095, over 23120.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01178, ecapa_loss=0.0002365, whisper_loss=0.09483, over 3907940.70 frames. ], batch size: 92, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:57:43,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-08-10 17:57:48,638 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 17:57:56,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=22.5 2024-08-10 17:58:06,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=678820.0, ans=0.125 2024-08-10 17:58:07,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-08-10 17:58:15,940 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-10 17:58:19,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.121e+01 2.724e+01 3.027e+01 3.695e+01 5.994e+01, threshold=6.053e+01, percent-clipped=0.0 2024-08-10 17:58:35,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=679120.0, ans=0.125 2024-08-10 17:58:41,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=679120.0, ans=0.125 2024-08-10 17:58:49,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=679220.0, ans=0.2 2024-08-10 17:58:50,495 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 9950, loss[loss=0.1127, beats_loss=0.009965, ecapa_loss=0.0002736, whisper_loss=0.09995, over 15911.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01184, ecapa_loss=0.0002373, whisper_loss=0.09425, over 3896167.57 frames. ], batch size: 63, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 17:58:52,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=679220.0, ans=0.2 2024-08-10 17:58:55,987 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-10 17:58:57,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=679220.0, ans=0.125 2024-08-10 17:59:04,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=679320.0, ans=0.0 2024-08-10 17:59:12,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679320.0, ans=0.1 2024-08-10 17:59:18,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=679420.0, ans=0.09899494936611666 2024-08-10 17:59:26,843 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 17:59:30,908 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-10 17:59:45,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=679520.0, ans=0.0 2024-08-10 17:59:47,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=679620.0, ans=0.0 2024-08-10 17:59:54,997 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-10 18:00:01,224 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-10 18:00:04,312 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10000, loss[loss=0.1096, beats_loss=0.01146, ecapa_loss=0.0002104, whisper_loss=0.09603, over 23370.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01188, ecapa_loss=0.0002361, whisper_loss=0.0944, over 3891948.53 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 274877906944.0 2024-08-10 18:00:05,999 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 18:00:12,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=679720.0, ans=0.125 2024-08-10 18:00:16,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679720.0, ans=0.1 2024-08-10 18:00:24,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=679820.0, ans=10.0 2024-08-10 18:00:40,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=679920.0, ans=0.0 2024-08-10 18:00:43,404 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-68000.pt 2024-08-10 18:00:47,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.793e+01 3.118e+01 3.876e+01 5.816e+01, threshold=6.237e+01, percent-clipped=0.0 2024-08-10 18:01:07,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=680120.0, ans=0.0 2024-08-10 18:01:08,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=680120.0, ans=0.0 2024-08-10 18:01:11,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2024-08-10 18:01:17,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10050, loss[loss=0.1074, beats_loss=0.009545, ecapa_loss=0.0002626, whisper_loss=0.09524, over 18799.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01183, ecapa_loss=0.0002345, whisper_loss=0.09522, over 3880717.89 frames. ], batch size: 76, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:01:20,936 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 18:01:41,312 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 18:01:54,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2024-08-10 18:01:59,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-10 18:02:13,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=680520.0, ans=0.125 2024-08-10 18:02:25,992 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 18:02:26,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=680620.0, ans=0.0 2024-08-10 18:02:29,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=680720.0, ans=0.0 2024-08-10 18:02:30,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10100, loss[loss=0.129, beats_loss=0.01121, ecapa_loss=0.0002177, whisper_loss=0.1156, over 23819.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01181, ecapa_loss=0.0002353, whisper_loss=0.09563, over 3900876.58 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:02:35,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=680720.0, ans=0.0 2024-08-10 18:02:48,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=680820.0, ans=0.0 2024-08-10 18:03:02,376 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:03:05,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=680920.0, ans=0.02 2024-08-10 18:03:12,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.905e+01 3.182e+01 3.646e+01 5.979e+01, threshold=6.363e+01, percent-clipped=0.0 2024-08-10 18:03:15,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.34 vs. limit=15.0 2024-08-10 18:03:48,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10150, loss[loss=0.0993, beats_loss=0.01317, ecapa_loss=0.0001888, whisper_loss=0.08424, over 14600.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01181, ecapa_loss=0.0002365, whisper_loss=0.0951, over 3899332.03 frames. ], batch size: 57, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:03:54,648 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 18:03:56,259 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 18:04:15,710 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 18:04:28,748 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 18:04:30,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=681420.0, ans=0.125 2024-08-10 18:04:30,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=681420.0, ans=0.0 2024-08-10 18:04:54,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=681620.0, ans=0.125 2024-08-10 18:05:04,253 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 18:05:09,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10200, loss[loss=0.1143, beats_loss=0.01065, ecapa_loss=0.0002141, whisper_loss=0.1015, over 21789.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01182, ecapa_loss=0.0002358, whisper_loss=0.09482, over 3929885.51 frames. ], batch size: 86, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:05:16,430 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-10 18:05:18,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=681720.0, ans=0.0 2024-08-10 18:05:30,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-10 18:05:35,281 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 18:05:54,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.852e+01 3.122e+01 3.821e+01 7.643e+01, threshold=6.244e+01, percent-clipped=3.0 2024-08-10 18:06:03,491 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.922e-01 2024-08-10 18:06:28,077 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10250, loss[loss=0.1106, beats_loss=0.01357, ecapa_loss=0.0001416, whisper_loss=0.09556, over 16983.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01181, ecapa_loss=0.0002346, whisper_loss=0.09478, over 3942774.22 frames. ], batch size: 62, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:06:33,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=682220.0, ans=0.015 2024-08-10 18:06:38,432 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 18:07:01,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-08-10 18:07:10,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-10 18:07:25,573 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 18:07:46,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10300, loss[loss=0.1161, beats_loss=0.0117, ecapa_loss=0.0002144, whisper_loss=0.1022, over 19821.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01183, ecapa_loss=0.0002347, whisper_loss=0.09492, over 3932583.90 frames. ], batch size: 79, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:07:59,690 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 18:07:59,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682720.0, ans=0.1 2024-08-10 18:08:19,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=682920.0, ans=10.0 2024-08-10 18:08:28,264 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 18:08:29,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.976e+01 3.282e+01 3.794e+01 5.948e+01, threshold=6.564e+01, percent-clipped=0.0 2024-08-10 18:08:34,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683020.0, ans=0.125 2024-08-10 18:08:37,037 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 18:08:53,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.18 vs. limit=10.0 2024-08-10 18:08:57,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683120.0, ans=0.125 2024-08-10 18:09:02,147 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10350, loss[loss=0.08833, beats_loss=0.01255, ecapa_loss=0.0002157, whisper_loss=0.07363, over 18867.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01178, ecapa_loss=0.0002346, whisper_loss=0.09579, over 3952092.96 frames. ], batch size: 79, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:09:05,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=683220.0, ans=0.0 2024-08-10 18:09:37,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=683420.0, ans=0.0 2024-08-10 18:09:43,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=683420.0, ans=0.125 2024-08-10 18:09:51,337 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 18:09:52,775 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 18:09:57,200 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-10 18:10:04,074 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 18:10:11,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2024-08-10 18:10:11,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.09 vs. limit=15.0 2024-08-10 18:10:20,457 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10400, loss[loss=0.1046, beats_loss=0.009582, ecapa_loss=0.0002905, whisper_loss=0.0921, over 17378.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01176, ecapa_loss=0.0002346, whisper_loss=0.0956, over 3921688.21 frames. ], batch size: 71, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:10:22,216 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 18:11:02,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.306e+01 2.804e+01 3.184e+01 3.674e+01 7.007e+01, threshold=6.369e+01, percent-clipped=1.0 2024-08-10 18:11:03,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=683920.0, ans=0.1 2024-08-10 18:11:04,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684020.0, ans=0.1 2024-08-10 18:11:05,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684020.0, ans=0.1 2024-08-10 18:11:20,961 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 18:11:25,290 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-10 18:11:27,846 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-10 18:11:34,750 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10450, loss[loss=0.1254, beats_loss=0.01054, ecapa_loss=0.0002349, whisper_loss=0.1125, over 21142.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01171, ecapa_loss=0.0002346, whisper_loss=0.09528, over 3898961.62 frames. ], batch size: 83, lr: 1.21e-02, grad_scale: 549755813888.0 2024-08-10 18:11:37,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2024-08-10 18:11:43,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2024-08-10 18:11:49,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=684320.0, ans=0.125 2024-08-10 18:11:52,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=684320.0, ans=0.125 2024-08-10 18:11:58,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=684320.0, ans=0.1 2024-08-10 18:12:13,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=684420.0, ans=0.0 2024-08-10 18:12:41,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=12.0 2024-08-10 18:12:48,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.82 vs. limit=22.5 2024-08-10 18:12:51,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=684620.0, ans=0.125 2024-08-10 18:12:54,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10500, loss[loss=0.1136, beats_loss=0.009385, ecapa_loss=0.0002621, whisper_loss=0.1016, over 16497.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01179, ecapa_loss=0.0002343, whisper_loss=0.09493, over 3919185.69 frames. ], batch size: 67, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:12:58,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=684720.0, ans=0.0 2024-08-10 18:13:09,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684820.0, ans=0.1 2024-08-10 18:13:10,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684820.0, ans=0.0 2024-08-10 18:13:13,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=684820.0, ans=0.0 2024-08-10 18:13:19,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=684820.0, ans=0.125 2024-08-10 18:13:20,217 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 18:13:22,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-08-10 18:13:35,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.920e+01 3.144e+01 3.815e+01 6.100e+01, threshold=6.288e+01, percent-clipped=0.0 2024-08-10 18:13:38,629 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 18:13:48,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=685020.0, ans=0.125 2024-08-10 18:13:53,405 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 18:14:09,634 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10550, loss[loss=0.1206, beats_loss=0.01079, ecapa_loss=0.0002213, whisper_loss=0.1076, over 20598.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01182, ecapa_loss=0.0002347, whisper_loss=0.09432, over 3908973.14 frames. ], batch size: 82, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:14:31,951 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 18:14:33,839 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 18:14:35,394 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 18:14:37,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2024-08-10 18:15:06,324 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 18:15:13,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=685620.0, ans=0.0 2024-08-10 18:15:22,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=685620.0, ans=0.125 2024-08-10 18:15:22,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=685620.0, ans=0.125 2024-08-10 18:15:28,783 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10600, loss[loss=0.1026, beats_loss=0.01247, ecapa_loss=0.000268, whisper_loss=0.08744, over 18363.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0118, ecapa_loss=0.0002356, whisper_loss=0.09432, over 3891023.92 frames. ], batch size: 78, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:16:06,137 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 18:16:06,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=685920.0, ans=0.0 2024-08-10 18:16:07,923 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 18:16:12,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.764e+01 3.108e+01 3.489e+01 4.887e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-10 18:16:15,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=686020.0, ans=0.0 2024-08-10 18:16:17,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=686020.0, ans=0.125 2024-08-10 18:16:18,907 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 18:16:33,584 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-10 18:16:35,417 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.528e-01 2024-08-10 18:16:37,732 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 18:16:41,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686120.0, ans=0.1 2024-08-10 18:16:46,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10650, loss[loss=0.09276, beats_loss=0.01221, ecapa_loss=0.0002722, whisper_loss=0.07783, over 21288.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01171, ecapa_loss=0.0002352, whisper_loss=0.09578, over 3918126.85 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:16:56,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=686220.0, ans=0.0 2024-08-10 18:17:30,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=686420.0, ans=0.125 2024-08-10 18:17:59,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=686620.0, ans=0.125 2024-08-10 18:18:04,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10700, loss[loss=0.1083, beats_loss=0.009368, ecapa_loss=0.0002743, whisper_loss=0.0962, over 15646.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01169, ecapa_loss=0.0002351, whisper_loss=0.09629, over 3915579.48 frames. ], batch size: 64, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:18:19,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=686820.0, ans=0.035 2024-08-10 18:18:29,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=686820.0, ans=0.0 2024-08-10 18:18:41,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=686920.0, ans=0.2 2024-08-10 18:18:44,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=686920.0, ans=0.05 2024-08-10 18:18:46,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=686920.0, ans=0.125 2024-08-10 18:18:47,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.883e+01 3.231e+01 3.765e+01 5.379e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-10 18:18:52,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=12.0 2024-08-10 18:18:56,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=687020.0, ans=0.05 2024-08-10 18:19:09,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-08-10 18:19:13,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=687120.0, ans=0.125 2024-08-10 18:19:22,078 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 18:19:23,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10750, loss[loss=0.1208, beats_loss=0.01295, ecapa_loss=0.0002228, whisper_loss=0.1056, over 18319.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01171, ecapa_loss=0.0002333, whisper_loss=0.09685, over 3893352.00 frames. ], batch size: 72, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:19:33,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2024-08-10 18:19:34,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687220.0, ans=0.1 2024-08-10 18:19:36,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=687220.0, ans=0.1 2024-08-10 18:19:38,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=687320.0, ans=0.125 2024-08-10 18:20:04,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=687420.0, ans=0.125 2024-08-10 18:20:07,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=687420.0, ans=0.0 2024-08-10 18:20:20,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=687520.0, ans=0.2 2024-08-10 18:20:25,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.69 vs. limit=10.0 2024-08-10 18:20:26,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687620.0, ans=0.1 2024-08-10 18:20:27,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=687620.0, ans=0.125 2024-08-10 18:20:35,466 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:20:40,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10800, loss[loss=0.1316, beats_loss=0.00991, ecapa_loss=0.0002732, whisper_loss=0.1189, over 22790.00 frames. ], tot_loss[loss=0.1109, beats_loss=0.01164, ecapa_loss=0.0002347, whisper_loss=0.09689, over 3887131.74 frames. ], batch size: 92, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:21:23,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+01 2.760e+01 3.130e+01 3.473e+01 5.037e+01, threshold=6.260e+01, percent-clipped=0.0 2024-08-10 18:21:27,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=688020.0, ans=0.2 2024-08-10 18:21:29,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=688020.0, ans=0.2 2024-08-10 18:21:57,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10850, loss[loss=0.1135, beats_loss=0.01182, ecapa_loss=0.0002335, whisper_loss=0.09938, over 21324.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.0116, ecapa_loss=0.0002342, whisper_loss=0.09716, over 3880376.37 frames. ], batch size: 86, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:22:23,253 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 15 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 18:22:28,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=688420.0, ans=0.125 2024-08-10 18:22:38,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=688420.0, ans=0.0 2024-08-10 18:22:42,760 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 18:22:52,035 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 18:23:02,467 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 19 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-10 18:23:09,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=688620.0, ans=0.125 2024-08-10 18:23:10,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688620.0, ans=0.1 2024-08-10 18:23:15,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10900, loss[loss=0.1133, beats_loss=0.01004, ecapa_loss=0.0002495, whisper_loss=0.1008, over 17714.00 frames. ], tot_loss[loss=0.1115, beats_loss=0.01155, ecapa_loss=0.0002351, whisper_loss=0.09759, over 3897346.15 frames. ], batch size: 70, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:23:18,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=688720.0, ans=0.125 2024-08-10 18:23:36,294 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 18 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-10 18:23:39,575 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 18:23:42,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-10 18:23:55,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=688920.0, ans=0.125 2024-08-10 18:24:02,057 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.842e+01 3.313e+01 3.977e+01 6.808e+01, threshold=6.627e+01, percent-clipped=2.0 2024-08-10 18:24:06,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=689020.0, ans=0.05 2024-08-10 18:24:17,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=689020.0, ans=0.125 2024-08-10 18:24:36,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 10950, loss[loss=0.119, beats_loss=0.0124, ecapa_loss=0.0002513, whisper_loss=0.1041, over 20942.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01164, ecapa_loss=0.0002342, whisper_loss=0.09647, over 3898983.49 frames. ], batch size: 87, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:24:46,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=689220.0, ans=0.125 2024-08-10 18:24:53,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=689320.0, ans=0.2 2024-08-10 18:25:11,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.59 vs. limit=22.5 2024-08-10 18:25:21,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=689520.0, ans=0.0 2024-08-10 18:25:52,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=689620.0, ans=0.2 2024-08-10 18:25:55,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11000, loss[loss=0.1223, beats_loss=0.009774, ecapa_loss=0.000195, whisper_loss=0.1106, over 21914.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01164, ecapa_loss=0.0002354, whisper_loss=0.09602, over 3897513.62 frames. ], batch size: 81, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:26:02,820 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 18:26:08,503 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:26:10,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=689820.0, ans=0.5 2024-08-10 18:26:12,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=689820.0, ans=0.025 2024-08-10 18:26:16,683 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-10 18:26:21,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=689820.0, ans=0.125 2024-08-10 18:26:41,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+01 2.857e+01 3.230e+01 3.620e+01 6.298e+01, threshold=6.460e+01, percent-clipped=0.0 2024-08-10 18:26:48,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2024-08-10 18:26:55,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=690020.0, ans=0.025 2024-08-10 18:26:59,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=690120.0, ans=0.07 2024-08-10 18:27:12,340 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 18:27:13,835 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-10 18:27:16,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11050, loss[loss=0.09698, beats_loss=0.01181, ecapa_loss=0.0002406, whisper_loss=0.08277, over 20645.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01163, ecapa_loss=0.000236, whisper_loss=0.09551, over 3919118.12 frames. ], batch size: 88, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:27:23,910 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 18:27:42,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2024-08-10 18:27:45,368 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 18:27:49,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-08-10 18:27:55,802 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 18:28:19,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=690620.0, ans=0.0 2024-08-10 18:28:21,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=690620.0, ans=0.125 2024-08-10 18:28:36,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11100, loss[loss=0.121, beats_loss=0.01225, ecapa_loss=0.0002706, whisper_loss=0.1061, over 20138.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01157, ecapa_loss=0.0002373, whisper_loss=0.09554, over 3904703.64 frames. ], batch size: 83, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:28:46,231 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-10 18:28:53,367 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 39 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-10 18:28:59,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=690820.0, ans=0.125 2024-08-10 18:29:11,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=690920.0, ans=0.125 2024-08-10 18:29:13,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=690920.0, ans=0.0 2024-08-10 18:29:18,975 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.710e+01 3.196e+01 3.800e+01 5.125e+01, threshold=6.392e+01, percent-clipped=0.0 2024-08-10 18:29:33,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=12.0 2024-08-10 18:29:54,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=691220.0, ans=0.125 2024-08-10 18:29:54,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11150, loss[loss=0.1083, beats_loss=0.01301, ecapa_loss=0.000237, whisper_loss=0.09296, over 19036.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01159, ecapa_loss=0.0002356, whisper_loss=0.09512, over 3905249.55 frames. ], batch size: 75, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:30:01,609 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 18:30:29,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=691420.0, ans=0.04949747468305833 2024-08-10 18:30:40,193 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 13 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-10 18:30:43,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=691520.0, ans=0.125 2024-08-10 18:31:04,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2024-08-10 18:31:05,119 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-10 18:31:14,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11200, loss[loss=0.09803, beats_loss=0.01157, ecapa_loss=0.0002442, whisper_loss=0.08402, over 20782.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01164, ecapa_loss=0.0002338, whisper_loss=0.09471, over 3924040.40 frames. ], batch size: 84, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:31:20,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=691720.0, ans=0.125 2024-08-10 18:31:37,199 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-10 18:31:45,164 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-10 18:31:47,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=691920.0, ans=0.125 2024-08-10 18:31:56,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.779e+01 3.196e+01 3.588e+01 6.419e+01, threshold=6.392e+01, percent-clipped=1.0 2024-08-10 18:32:02,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692020.0, ans=0.1 2024-08-10 18:32:04,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=692020.0, ans=0.125 2024-08-10 18:32:06,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2024-08-10 18:32:31,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11250, loss[loss=0.09251, beats_loss=0.009332, ecapa_loss=0.000272, whisper_loss=0.08046, over 15481.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01164, ecapa_loss=0.0002329, whisper_loss=0.09485, over 3898079.37 frames. ], batch size: 64, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:32:35,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=692220.0, ans=0.1 2024-08-10 18:32:41,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=692220.0, ans=0.125 2024-08-10 18:32:46,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=692320.0, ans=0.2 2024-08-10 18:32:48,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.46 vs. limit=8.0 2024-08-10 18:33:10,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=692420.0, ans=0.0 2024-08-10 18:33:12,764 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 18:33:15,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=692420.0, ans=0.04949747468305833 2024-08-10 18:33:17,275 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 18:33:29,607 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 18:33:36,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=692620.0, ans=0.025 2024-08-10 18:33:51,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11300, loss[loss=0.1088, beats_loss=0.01085, ecapa_loss=0.0002313, whisper_loss=0.09566, over 22652.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01169, ecapa_loss=0.0002328, whisper_loss=0.09479, over 3917695.61 frames. ], batch size: 91, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:33:57,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=692720.0, ans=0.0 2024-08-10 18:34:11,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=692820.0, ans=0.025 2024-08-10 18:34:20,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=692820.0, ans=0.125 2024-08-10 18:34:29,374 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 18:34:33,887 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 18:34:35,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.891e+01 3.346e+01 3.835e+01 5.621e+01, threshold=6.692e+01, percent-clipped=0.0 2024-08-10 18:34:51,404 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-10 18:35:01,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=12.0 2024-08-10 18:35:05,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=693120.0, ans=0.125 2024-08-10 18:35:06,647 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 18:35:09,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11350, loss[loss=0.09661, beats_loss=0.01405, ecapa_loss=0.0001774, whisper_loss=0.08079, over 21061.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01166, ecapa_loss=0.0002321, whisper_loss=0.09543, over 3927357.97 frames. ], batch size: 84, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:35:10,596 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-10 18:35:10,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=693220.0, ans=0.0 2024-08-10 18:35:14,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2024-08-10 18:35:19,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2024-08-10 18:35:20,618 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 18:35:35,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=693320.0, ans=0.0 2024-08-10 18:35:40,953 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 18:35:50,583 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-10 18:36:09,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=693620.0, ans=0.125 2024-08-10 18:36:24,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11400, loss[loss=0.1044, beats_loss=0.01202, ecapa_loss=0.0002425, whisper_loss=0.08998, over 19752.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01169, ecapa_loss=0.0002316, whisper_loss=0.09532, over 3919592.01 frames. ], batch size: 82, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:36:42,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-10 18:36:42,984 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-10 18:37:07,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+01 2.890e+01 3.279e+01 3.857e+01 6.641e+01, threshold=6.557e+01, percent-clipped=0.0 2024-08-10 18:37:09,880 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-10 18:37:17,870 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 18:37:19,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=694020.0, ans=0.0 2024-08-10 18:37:29,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=694120.0, ans=0.125 2024-08-10 18:37:29,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694120.0, ans=0.1 2024-08-10 18:37:39,649 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11450, loss[loss=0.09781, beats_loss=0.01166, ecapa_loss=0.0002356, whisper_loss=0.08379, over 17321.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01163, ecapa_loss=0.0002343, whisper_loss=0.09575, over 3923331.62 frames. ], batch size: 68, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:37:50,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=694220.0, ans=0.125 2024-08-10 18:37:54,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=694320.0, ans=0.125 2024-08-10 18:38:11,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694420.0, ans=0.0 2024-08-10 18:38:16,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694420.0, ans=0.1 2024-08-10 18:38:16,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=694420.0, ans=0.0 2024-08-10 18:38:17,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=694420.0, ans=0.125 2024-08-10 18:38:23,481 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 18:38:30,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=694520.0, ans=0.125 2024-08-10 18:38:32,783 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 18:38:37,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=694520.0, ans=0.125 2024-08-10 18:38:46,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-08-10 18:38:53,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=694620.0, ans=0.07 2024-08-10 18:38:57,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11500, loss[loss=0.1254, beats_loss=0.01374, ecapa_loss=0.0002509, whisper_loss=0.1092, over 14387.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01169, ecapa_loss=0.0002329, whisper_loss=0.09543, over 3898657.26 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:38:59,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=694720.0, ans=0.0 2024-08-10 18:39:09,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694720.0, ans=0.1 2024-08-10 18:39:22,195 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 33 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-10 18:39:22,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=12.0 2024-08-10 18:39:39,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=694920.0, ans=0.2 2024-08-10 18:39:40,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.741e+01 3.082e+01 3.618e+01 5.964e+01, threshold=6.164e+01, percent-clipped=0.0 2024-08-10 18:39:51,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=695020.0, ans=0.0 2024-08-10 18:40:04,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=695120.0, ans=0.0 2024-08-10 18:40:14,782 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11550, loss[loss=0.1152, beats_loss=0.01186, ecapa_loss=0.0002085, whisper_loss=0.1013, over 22752.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01171, ecapa_loss=0.0002312, whisper_loss=0.0957, over 3903699.39 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:40:23,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2024-08-10 18:40:38,235 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.871e-01 2024-08-10 18:40:47,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=695420.0, ans=0.125 2024-08-10 18:40:48,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.44 vs. limit=6.0 2024-08-10 18:41:16,469 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 18:41:22,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695620.0, ans=0.1 2024-08-10 18:41:33,428 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11600, loss[loss=0.09544, beats_loss=0.01034, ecapa_loss=0.0002358, whisper_loss=0.08274, over 21192.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01178, ecapa_loss=0.0002308, whisper_loss=0.09428, over 3919869.27 frames. ], batch size: 83, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:41:38,552 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-10 18:41:44,079 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 18:42:05,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-10 18:42:16,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.151e+01 2.928e+01 3.314e+01 3.952e+01 8.355e+01, threshold=6.627e+01, percent-clipped=1.0 2024-08-10 18:42:32,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=696120.0, ans=0.0 2024-08-10 18:42:49,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11650, loss[loss=0.09019, beats_loss=0.01069, ecapa_loss=0.0002787, whisper_loss=0.07671, over 18999.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01172, ecapa_loss=0.0002326, whisper_loss=0.09496, over 3919905.02 frames. ], batch size: 79, lr: 1.20e-02, grad_scale: 549755813888.0 2024-08-10 18:42:53,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=696220.0, ans=0.2 2024-08-10 18:43:00,182 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 18:43:04,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2024-08-10 18:43:05,141 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 18:43:22,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=696420.0, ans=0.2 2024-08-10 18:43:23,206 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 18:43:26,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2024-08-10 18:43:56,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=696620.0, ans=0.125 2024-08-10 18:43:57,218 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 18:43:59,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11700, loss[loss=0.08435, beats_loss=0.01377, ecapa_loss=0.0002255, whisper_loss=0.06832, over 13442.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01176, ecapa_loss=0.0002318, whisper_loss=0.0953, over 3932148.74 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:44:15,498 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 16 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-10 18:44:19,772 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-10 18:44:22,443 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 18:44:25,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=696820.0, ans=0.0 2024-08-10 18:44:39,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.923e+01 3.356e+01 3.959e+01 5.415e+01, threshold=6.712e+01, percent-clipped=0.0 2024-08-10 18:44:39,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=696920.0, ans=0.125 2024-08-10 18:44:41,969 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 18:45:05,698 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 18:45:10,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11750, loss[loss=0.1188, beats_loss=0.01204, ecapa_loss=0.0002416, whisper_loss=0.1044, over 23329.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01175, ecapa_loss=0.0002318, whisper_loss=0.09588, over 3943183.55 frames. ], batch size: 93, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:45:10,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=697220.0, ans=0.2 2024-08-10 18:45:27,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=697320.0, ans=0.0 2024-08-10 18:45:31,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=697320.0, ans=0.0 2024-08-10 18:45:37,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=697420.0, ans=0.125 2024-08-10 18:45:37,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=697420.0, ans=0.125 2024-08-10 18:45:44,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=697420.0, ans=0.1 2024-08-10 18:45:45,139 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-10 18:45:45,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2024-08-10 18:45:49,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=697420.0, ans=0.025 2024-08-10 18:45:55,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=697520.0, ans=0.0 2024-08-10 18:45:56,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=697520.0, ans=0.125 2024-08-10 18:46:00,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=19.14 vs. limit=15.0 2024-08-10 18:46:17,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-10 18:46:19,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11800, loss[loss=0.1198, beats_loss=0.01205, ecapa_loss=0.0002187, whisper_loss=0.1055, over 18683.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01175, ecapa_loss=0.0002318, whisper_loss=0.09585, over 3925773.02 frames. ], batch size: 75, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:46:25,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=697720.0, ans=0.2 2024-08-10 18:46:37,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2024-08-10 18:46:40,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=697820.0, ans=0.125 2024-08-10 18:46:46,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=697920.0, ans=0.0 2024-08-10 18:46:53,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=697920.0, ans=0.125 2024-08-10 18:46:54,748 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-10 18:46:57,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2024-08-10 18:46:58,863 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 3.039e+01 3.457e+01 3.903e+01 6.365e+01, threshold=6.915e+01, percent-clipped=0.0 2024-08-10 18:47:00,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-10 18:47:03,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-10 18:47:05,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=698020.0, ans=0.95 2024-08-10 18:47:30,754 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11850, loss[loss=0.1185, beats_loss=0.01017, ecapa_loss=0.0002031, whisper_loss=0.1063, over 18934.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01173, ecapa_loss=0.000231, whisper_loss=0.09573, over 3939846.24 frames. ], batch size: 74, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:47:34,145 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-10 18:47:39,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698220.0, ans=0.1 2024-08-10 18:47:51,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=698320.0, ans=0.125 2024-08-10 18:48:04,356 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 18:48:08,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=698420.0, ans=0.2 2024-08-10 18:48:11,506 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.302e-02 2024-08-10 18:48:19,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-08-10 18:48:26,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698620.0, ans=0.1 2024-08-10 18:48:39,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11900, loss[loss=0.1163, beats_loss=0.01019, ecapa_loss=0.0002241, whisper_loss=0.1038, over 22017.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.01169, ecapa_loss=0.0002314, whisper_loss=0.09592, over 3946518.62 frames. ], batch size: 87, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:48:40,559 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 18:48:48,623 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 18:49:02,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=698820.0, ans=0.125 2024-08-10 18:49:17,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.855e+01 3.159e+01 3.498e+01 6.204e+01, threshold=6.318e+01, percent-clipped=0.0 2024-08-10 18:49:17,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=698920.0, ans=0.0 2024-08-10 18:49:20,069 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 18:49:31,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=699020.0, ans=0.0 2024-08-10 18:49:37,411 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 18:49:45,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 18:49:46,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 11950, loss[loss=0.1193, beats_loss=0.01007, ecapa_loss=0.0002645, whisper_loss=0.1066, over 17147.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01168, ecapa_loss=0.0002324, whisper_loss=0.09641, over 3953101.15 frames. ], batch size: 69, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:49:47,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=699220.0, ans=0.2 2024-08-10 18:49:48,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-10 18:50:03,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=699320.0, ans=0.04949747468305833 2024-08-10 18:50:04,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699320.0, ans=0.1 2024-08-10 18:50:44,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=699620.0, ans=10.0 2024-08-10 18:50:45,611 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-10 18:50:53,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12000, loss[loss=0.1355, beats_loss=0.01077, ecapa_loss=0.000207, whisper_loss=0.1227, over 17689.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.01161, ecapa_loss=0.0002322, whisper_loss=0.09657, over 3921243.45 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 549755813888.0 2024-08-10 18:50:53,565 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 18:51:35,520 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on ASR_libri: loss=0.2622, beats_loss=0, ecapa_loss=0.0007279, whisper_loss=0.255, over 922467.00 frames. 2024-08-10 18:51:54,224 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on SV_voxceleb1: loss=0.006203, beats_loss=0, ecapa_loss=0.0006203, whisper_loss=0, over 939242.00 frames. 2024-08-10 18:53:47,252 INFO [train_multi_KD3.py:1149] (0/4) Epoch 5, validation on AT_audioset: loss=0.02662, beats_loss=0.02662, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 18:53:47,256 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 18:54:07,604 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 18:54:24,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=699920.0, ans=0.05 2024-08-10 18:54:25,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.732e+01 3.134e+01 3.531e+01 7.163e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-10 18:54:26,780 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 18:54:29,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=700020.0, ans=0.2 2024-08-10 18:54:38,922 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 18:54:54,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=700220.0, ans=0.0 2024-08-10 18:54:54,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12050, loss[loss=0.125, beats_loss=0.01254, ecapa_loss=0.0002216, whisper_loss=0.1103, over 14330.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.0116, ecapa_loss=0.0002331, whisper_loss=0.09635, over 3912076.94 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:54:56,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=700220.0, ans=0.2 2024-08-10 18:55:01,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=700220.0, ans=0.125 2024-08-10 18:55:03,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=700220.0, ans=0.125 2024-08-10 18:55:09,641 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 18:55:12,367 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-10 18:55:12,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=700320.0, ans=0.0 2024-08-10 18:55:15,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=700320.0, ans=0.0 2024-08-10 18:55:30,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=700420.0, ans=0.0 2024-08-10 18:55:44,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=700520.0, ans=0.0 2024-08-10 18:55:55,431 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-10 18:56:02,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12100, loss[loss=0.1036, beats_loss=0.01132, ecapa_loss=0.0002179, whisper_loss=0.09012, over 14535.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01161, ecapa_loss=0.0002336, whisper_loss=0.09632, over 3898851.49 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:56:07,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=700720.0, ans=0.125 2024-08-10 18:56:13,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=700720.0, ans=0.125 2024-08-10 18:56:38,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=700920.0, ans=0.125 2024-08-10 18:56:40,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+01 2.774e+01 3.188e+01 3.789e+01 5.825e+01, threshold=6.376e+01, percent-clipped=0.0 2024-08-10 18:56:47,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-08-10 18:56:51,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701020.0, ans=0.1 2024-08-10 18:56:54,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701020.0, ans=0.1 2024-08-10 18:57:09,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12150, loss[loss=0.0999, beats_loss=0.01504, ecapa_loss=0.0002303, whisper_loss=0.08256, over 21774.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01176, ecapa_loss=0.0002331, whisper_loss=0.09501, over 3889666.09 frames. ], batch size: 89, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:57:16,711 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 18:57:21,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-08-10 18:57:40,241 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 18:57:40,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=701420.0, ans=0.07 2024-08-10 18:57:46,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=701420.0, ans=0.0 2024-08-10 18:57:48,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=701420.0, ans=0.2 2024-08-10 18:58:00,604 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 18:58:00,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=701520.0, ans=0.0 2024-08-10 18:58:03,392 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-10 18:58:14,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=701620.0, ans=0.125 2024-08-10 18:58:17,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12200, loss[loss=0.1036, beats_loss=0.01207, ecapa_loss=0.0001805, whisper_loss=0.08972, over 21551.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01166, ecapa_loss=0.0002325, whisper_loss=0.0952, over 3828902.59 frames. ], batch size: 82, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:58:19,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701720.0, ans=0.1 2024-08-10 18:58:34,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=701820.0, ans=0.0 2024-08-10 18:58:38,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=701820.0, ans=0.0 2024-08-10 18:58:38,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=701820.0, ans=0.0 2024-08-10 18:58:40,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=701820.0, ans=0.125 2024-08-10 18:58:49,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2024-08-10 18:58:49,705 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 18:58:54,770 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 18:58:55,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.216e+01 2.903e+01 3.177e+01 3.659e+01 7.236e+01, threshold=6.353e+01, percent-clipped=1.0 2024-08-10 18:59:02,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=702020.0, ans=0.0 2024-08-10 18:59:11,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2024-08-10 18:59:24,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-10 18:59:25,047 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12250, loss[loss=0.1002, beats_loss=0.01333, ecapa_loss=0.0001942, whisper_loss=0.0849, over 16017.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01163, ecapa_loss=0.0002317, whisper_loss=0.09557, over 3836317.75 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 18:59:47,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-08-10 19:00:03,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-08-10 19:00:07,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=702520.0, ans=0.0 2024-08-10 19:00:08,897 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 19:00:13,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=702520.0, ans=0.0 2024-08-10 19:00:16,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=702620.0, ans=0.125 2024-08-10 19:00:18,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=702620.0, ans=0.0 2024-08-10 19:00:31,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=702720.0, ans=0.125 2024-08-10 19:00:32,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12300, loss[loss=0.09122, beats_loss=0.01333, ecapa_loss=0.0002288, whisper_loss=0.07561, over 22316.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01172, ecapa_loss=0.0002313, whisper_loss=0.09544, over 3838324.66 frames. ], batch size: 91, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:00:38,170 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 19:00:50,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=15.0 2024-08-10 19:01:00,670 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-10 19:01:10,066 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.851e+01 3.322e+01 3.771e+01 6.110e+01, threshold=6.644e+01, percent-clipped=0.0 2024-08-10 19:01:15,757 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 19:01:25,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703120.0, ans=0.125 2024-08-10 19:01:29,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=703120.0, ans=0.2 2024-08-10 19:01:37,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=703120.0, ans=0.125 2024-08-10 19:01:39,863 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12350, loss[loss=0.1115, beats_loss=0.01282, ecapa_loss=0.0002243, whisper_loss=0.09643, over 16752.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01167, ecapa_loss=0.0002331, whisper_loss=0.09612, over 3853283.83 frames. ], batch size: 67, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:01:45,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=703220.0, ans=0.0 2024-08-10 19:01:50,862 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-10 19:02:16,614 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-10 19:02:21,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=703420.0, ans=0.025 2024-08-10 19:02:24,119 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 19:02:24,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=703520.0, ans=10.0 2024-08-10 19:02:36,774 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 19:02:41,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.61 vs. limit=10.0 2024-08-10 19:02:42,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=703620.0, ans=0.0 2024-08-10 19:02:45,288 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-10 19:02:52,907 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12400, loss[loss=0.0974, beats_loss=0.01285, ecapa_loss=0.0002478, whisper_loss=0.08207, over 21653.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01174, ecapa_loss=0.0002325, whisper_loss=0.09548, over 3883724.19 frames. ], batch size: 91, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:02:54,467 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 19:02:58,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703720.0, ans=0.125 2024-08-10 19:03:29,154 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-10 19:03:31,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.592e+01 3.077e+01 3.649e+01 6.276e+01, threshold=6.154e+01, percent-clipped=0.0 2024-08-10 19:03:32,340 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-10 19:03:35,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-10 19:03:37,576 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-10 19:03:51,933 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-10 19:04:01,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=12.0 2024-08-10 19:04:03,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12450, loss[loss=0.11, beats_loss=0.01299, ecapa_loss=0.0002264, whisper_loss=0.09477, over 22803.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01161, ecapa_loss=0.0002325, whisper_loss=0.09637, over 3912132.48 frames. ], batch size: 90, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:04:11,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.21 vs. limit=22.5 2024-08-10 19:04:15,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=704320.0, ans=10.0 2024-08-10 19:04:46,703 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-10 19:04:51,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.20 vs. limit=15.0 2024-08-10 19:04:54,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=12.0 2024-08-10 19:05:12,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12500, loss[loss=0.1236, beats_loss=0.008097, ecapa_loss=0.0002821, whisper_loss=0.1127, over 17436.00 frames. ], tot_loss[loss=0.1112, beats_loss=0.01154, ecapa_loss=0.0002324, whisper_loss=0.0973, over 3924290.48 frames. ], batch size: 67, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:05:12,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=704720.0, ans=0.1 2024-08-10 19:05:12,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=704720.0, ans=0.0 2024-08-10 19:05:13,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=704720.0, ans=0.2 2024-08-10 19:05:28,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-10 19:05:51,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.864e+01 3.204e+01 3.870e+01 6.784e+01, threshold=6.407e+01, percent-clipped=3.0 2024-08-10 19:05:54,334 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-10 19:06:00,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=705020.0, ans=0.125 2024-08-10 19:06:01,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=705020.0, ans=0.2 2024-08-10 19:06:02,712 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 19:06:13,893 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 19:06:19,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=705120.0, ans=0.2 2024-08-10 19:06:21,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12550, loss[loss=0.1104, beats_loss=0.009148, ecapa_loss=0.0002451, whisper_loss=0.09879, over 17372.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01147, ecapa_loss=0.0002333, whisper_loss=0.0968, over 3909567.15 frames. ], batch size: 69, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:06:27,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-10 19:06:31,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=705220.0, ans=0.0 2024-08-10 19:06:40,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705320.0, ans=0.1 2024-08-10 19:07:05,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2024-08-10 19:07:14,892 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.283e-01 2024-08-10 19:07:18,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.65 vs. limit=15.0 2024-08-10 19:07:27,118 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 19:07:34,881 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12600, loss[loss=0.1027, beats_loss=0.0116, ecapa_loss=0.0002551, whisper_loss=0.08852, over 21678.00 frames. ], tot_loss[loss=0.1111, beats_loss=0.01157, ecapa_loss=0.0002337, whisper_loss=0.09716, over 3911349.87 frames. ], batch size: 91, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:07:42,991 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 19:08:08,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-10 19:08:12,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.787e+01 3.074e+01 3.484e+01 6.689e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-10 19:08:16,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-10 19:08:18,363 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 19:08:18,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706020.0, ans=0.125 2024-08-10 19:08:33,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=706120.0, ans=0.0 2024-08-10 19:08:35,646 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 19:08:37,170 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 19:08:42,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12650, loss[loss=0.09591, beats_loss=0.01246, ecapa_loss=0.00021, whisper_loss=0.08135, over 20012.00 frames. ], tot_loss[loss=0.1106, beats_loss=0.01162, ecapa_loss=0.0002353, whisper_loss=0.09658, over 3918774.14 frames. ], batch size: 78, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:08:53,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=706220.0, ans=0.125 2024-08-10 19:09:04,863 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 19:09:13,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=706420.0, ans=0.125 2024-08-10 19:09:21,330 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 19:09:25,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=706520.0, ans=0.125 2024-08-10 19:09:28,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706520.0, ans=0.1 2024-08-10 19:09:29,358 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-10 19:09:33,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=706520.0, ans=0.0 2024-08-10 19:09:43,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-08-10 19:09:49,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12700, loss[loss=0.1119, beats_loss=0.01167, ecapa_loss=0.0002288, whisper_loss=0.09792, over 22058.00 frames. ], tot_loss[loss=0.1103, beats_loss=0.01178, ecapa_loss=0.0002328, whisper_loss=0.0962, over 3950341.73 frames. ], batch size: 89, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:10:05,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=706820.0, ans=0.125 2024-08-10 19:10:24,962 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-10 19:10:25,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=706920.0, ans=0.125 2024-08-10 19:10:25,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=706920.0, ans=0.125 2024-08-10 19:10:27,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.805e+01 3.118e+01 3.753e+01 7.808e+01, threshold=6.236e+01, percent-clipped=1.0 2024-08-10 19:10:28,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707020.0, ans=0.1 2024-08-10 19:10:44,933 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 19:10:48,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-08-10 19:10:50,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=707120.0, ans=0.0 2024-08-10 19:10:51,167 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-10 19:10:57,196 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12750, loss[loss=0.1191, beats_loss=0.01222, ecapa_loss=0.0001951, whisper_loss=0.1049, over 22470.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01183, ecapa_loss=0.000232, whisper_loss=0.09607, over 3937123.92 frames. ], batch size: 88, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:11:11,035 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 19:11:24,275 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 19:11:33,226 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 19:11:39,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=707520.0, ans=0.0 2024-08-10 19:11:58,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=707620.0, ans=0.125 2024-08-10 19:11:58,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707620.0, ans=0.1 2024-08-10 19:11:59,474 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-10 19:12:01,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=707620.0, ans=0.0 2024-08-10 19:12:01,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=707620.0, ans=0.0 2024-08-10 19:12:04,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12800, loss[loss=0.1322, beats_loss=0.009234, ecapa_loss=0.0002092, whisper_loss=0.1209, over 17388.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01174, ecapa_loss=0.0002332, whisper_loss=0.09616, over 3906203.80 frames. ], batch size: 66, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:12:06,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=707720.0, ans=0.125 2024-08-10 19:12:08,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2024-08-10 19:12:10,379 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 19:12:10,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=707720.0, ans=0.125 2024-08-10 19:12:15,662 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 19:12:18,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=707820.0, ans=0.1 2024-08-10 19:12:25,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=707820.0, ans=0.0 2024-08-10 19:12:27,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=707820.0, ans=0.2 2024-08-10 19:12:41,923 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.755e+01 3.116e+01 3.558e+01 5.514e+01, threshold=6.233e+01, percent-clipped=0.0 2024-08-10 19:12:42,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=707920.0, ans=0.0 2024-08-10 19:12:46,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-08-10 19:12:53,956 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-10 19:12:58,086 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.047e+03 2024-08-10 19:13:01,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-10 19:13:02,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=708120.0, ans=0.0 2024-08-10 19:13:05,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=708120.0, ans=0.0 2024-08-10 19:13:05,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.84 vs. limit=10.0 2024-08-10 19:13:11,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12850, loss[loss=0.1149, beats_loss=0.01057, ecapa_loss=0.0002533, whisper_loss=0.1018, over 23550.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01175, ecapa_loss=0.0002338, whisper_loss=0.09573, over 3914254.65 frames. ], batch size: 92, lr: 1.19e-02, grad_scale: 1099511627776.0 2024-08-10 19:13:22,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708220.0, ans=0.1 2024-08-10 19:13:24,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=708320.0, ans=0.0 2024-08-10 19:13:28,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708320.0, ans=0.1 2024-08-10 19:13:29,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-08-10 19:13:40,446 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 19:13:43,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=708420.0, ans=0.125 2024-08-10 19:13:58,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=708520.0, ans=0.125 2024-08-10 19:14:08,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-08-10 19:14:12,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=708620.0, ans=0.125 2024-08-10 19:14:15,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-08-10 19:14:18,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12900, loss[loss=0.115, beats_loss=0.009184, ecapa_loss=0.0002642, whisper_loss=0.1032, over 23128.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01177, ecapa_loss=0.0002344, whisper_loss=0.09532, over 3897209.67 frames. ], batch size: 94, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:14:31,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=708820.0, ans=0.125 2024-08-10 19:14:34,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2024-08-10 19:14:45,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=708920.0, ans=0.125 2024-08-10 19:14:46,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-08-10 19:14:55,258 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.870e+01 3.277e+01 3.550e+01 6.009e+01, threshold=6.554e+01, percent-clipped=0.0 2024-08-10 19:14:55,496 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 19:14:58,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=709020.0, ans=0.0 2024-08-10 19:15:08,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=709020.0, ans=0.125 2024-08-10 19:15:24,329 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 12950, loss[loss=0.109, beats_loss=0.01292, ecapa_loss=0.0002227, whisper_loss=0.09383, over 22099.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01177, ecapa_loss=0.0002335, whisper_loss=0.09521, over 3896164.11 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:15:33,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709220.0, ans=0.125 2024-08-10 19:15:44,195 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 19:15:53,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=709420.0, ans=0.125 2024-08-10 19:16:01,465 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.326e-01 2024-08-10 19:16:11,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=709520.0, ans=0.035 2024-08-10 19:16:30,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13000, loss[loss=0.08442, beats_loss=0.01653, ecapa_loss=0.0002116, whisper_loss=0.06577, over 21399.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01179, ecapa_loss=0.0002334, whisper_loss=0.09492, over 3919590.60 frames. ], batch size: 91, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:16:44,556 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 19:16:45,909 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 19:16:46,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-10 19:16:51,239 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 19:16:54,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-10 19:17:06,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.263e+01 2.942e+01 3.329e+01 3.753e+01 5.609e+01, threshold=6.657e+01, percent-clipped=0.0 2024-08-10 19:17:14,967 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-10 19:17:17,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710020.0, ans=0.1 2024-08-10 19:17:20,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=710020.0, ans=0.125 2024-08-10 19:17:20,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.08 vs. limit=10.0 2024-08-10 19:17:28,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=710120.0, ans=0.125 2024-08-10 19:17:34,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2024-08-10 19:17:35,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13050, loss[loss=0.1172, beats_loss=0.01045, ecapa_loss=0.0002258, whisper_loss=0.1044, over 19498.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01178, ecapa_loss=0.0002326, whisper_loss=0.0942, over 3873744.27 frames. ], batch size: 77, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:18:13,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=710420.0, ans=10.0 2024-08-10 19:18:17,196 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-10 19:18:21,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-10 19:18:34,314 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 19:18:39,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=710620.0, ans=0.125 2024-08-10 19:18:40,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=15.0 2024-08-10 19:18:42,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13100, loss[loss=0.1081, beats_loss=0.01199, ecapa_loss=0.0002202, whisper_loss=0.09393, over 19702.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01181, ecapa_loss=0.0002319, whisper_loss=0.09454, over 3870388.21 frames. ], batch size: 78, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:18:46,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=710720.0, ans=0.125 2024-08-10 19:19:18,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=710920.0, ans=0.125 2024-08-10 19:19:19,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.880e+01 3.300e+01 3.880e+01 5.965e+01, threshold=6.600e+01, percent-clipped=0.0 2024-08-10 19:19:22,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=711020.0, ans=0.125 2024-08-10 19:19:25,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=711020.0, ans=0.1 2024-08-10 19:19:37,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=711120.0, ans=0.125 2024-08-10 19:19:38,278 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 19:19:38,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=711120.0, ans=0.0 2024-08-10 19:19:48,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13150, loss[loss=0.1095, beats_loss=0.008558, ecapa_loss=0.0003217, whisper_loss=0.0977, over 14486.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01181, ecapa_loss=0.000233, whisper_loss=0.09435, over 3872773.38 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:19:51,826 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-10 19:20:09,256 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:20:09,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=711320.0, ans=0.0 2024-08-10 19:20:10,559 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 19:20:10,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=711320.0, ans=0.0 2024-08-10 19:20:35,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=711520.0, ans=0.2 2024-08-10 19:20:38,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=711520.0, ans=0.0 2024-08-10 19:20:44,077 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 19:20:59,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13200, loss[loss=0.0874, beats_loss=0.01196, ecapa_loss=0.000268, whisper_loss=0.07277, over 20481.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01177, ecapa_loss=0.0002328, whisper_loss=0.09497, over 3851395.63 frames. ], batch size: 88, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:20:59,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=711720.0, ans=0.0 2024-08-10 19:21:01,804 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 19:21:03,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=711720.0, ans=0.0 2024-08-10 19:21:08,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=711720.0, ans=0.0 2024-08-10 19:21:21,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=711820.0, ans=0.0 2024-08-10 19:21:29,696 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 19:21:37,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+01 3.006e+01 3.463e+01 3.966e+01 7.207e+01, threshold=6.927e+01, percent-clipped=1.0 2024-08-10 19:21:37,486 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 14 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 19:21:49,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=712020.0, ans=0.125 2024-08-10 19:21:54,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=712120.0, ans=0.2 2024-08-10 19:22:05,839 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13250, loss[loss=0.09274, beats_loss=0.01063, ecapa_loss=0.000294, whisper_loss=0.07916, over 20679.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01168, ecapa_loss=0.0002332, whisper_loss=0.09481, over 3861993.52 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:22:15,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=712220.0, ans=0.125 2024-08-10 19:22:18,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=712320.0, ans=0.125 2024-08-10 19:22:22,892 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 19:22:23,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2024-08-10 19:22:25,377 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-10 19:22:30,418 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-10 19:23:11,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13300, loss[loss=0.1238, beats_loss=0.01167, ecapa_loss=0.000181, whisper_loss=0.1103, over 23205.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01174, ecapa_loss=0.0002315, whisper_loss=0.09502, over 3863569.96 frames. ], batch size: 91, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:23:16,348 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-10 19:23:24,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=712820.0, ans=0.125 2024-08-10 19:23:31,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=712820.0, ans=0.0 2024-08-10 19:23:50,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=712920.0, ans=0.125 2024-08-10 19:23:50,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.791e+01 3.143e+01 3.422e+01 5.648e+01, threshold=6.287e+01, percent-clipped=0.0 2024-08-10 19:24:03,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=713020.0, ans=0.07 2024-08-10 19:24:19,925 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13350, loss[loss=0.09419, beats_loss=0.01252, ecapa_loss=0.0002286, whisper_loss=0.07938, over 19621.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01179, ecapa_loss=0.0002317, whisper_loss=0.09442, over 3835029.77 frames. ], batch size: 78, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:24:22,839 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 19:24:24,004 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-10 19:24:41,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713320.0, ans=0.1 2024-08-10 19:24:43,024 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 19:24:51,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-08-10 19:24:51,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-08-10 19:24:55,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=713420.0, ans=0.125 2024-08-10 19:24:59,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-10 19:25:07,585 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-10 19:25:23,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=713620.0, ans=0.125 2024-08-10 19:25:23,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-10 19:25:26,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13400, loss[loss=0.1238, beats_loss=0.00846, ecapa_loss=0.0002932, whisper_loss=0.1124, over 20884.00 frames. ], tot_loss[loss=0.109, beats_loss=0.0118, ecapa_loss=0.0002324, whisper_loss=0.09487, over 3865392.85 frames. ], batch size: 83, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:25:31,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.49 vs. limit=10.0 2024-08-10 19:25:33,706 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 30 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 19:25:35,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=713720.0, ans=0.125 2024-08-10 19:25:39,694 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 19:25:47,571 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 19:26:01,707 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 19:26:05,692 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.716e+01 3.114e+01 3.677e+01 5.856e+01, threshold=6.229e+01, percent-clipped=0.0 2024-08-10 19:26:11,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=714020.0, ans=0.125 2024-08-10 19:26:31,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-10 19:26:38,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13450, loss[loss=0.09856, beats_loss=0.01297, ecapa_loss=0.000245, whisper_loss=0.08314, over 21639.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01182, ecapa_loss=0.0002312, whisper_loss=0.09504, over 3906735.41 frames. ], batch size: 88, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:26:38,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714220.0, ans=0.1 2024-08-10 19:27:21,188 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 19:27:39,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=714520.0, ans=15.0 2024-08-10 19:27:40,534 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 19:28:02,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714620.0, ans=0.1 2024-08-10 19:28:18,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13500, loss[loss=0.1058, beats_loss=0.01405, ecapa_loss=0.0002082, whisper_loss=0.08969, over 23323.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01173, ecapa_loss=0.0002326, whisper_loss=0.09552, over 3914811.66 frames. ], batch size: 92, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:28:28,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=714720.0, ans=0.035 2024-08-10 19:28:54,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2024-08-10 19:29:11,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.843e+01 3.302e+01 3.860e+01 1.367e+02, threshold=6.604e+01, percent-clipped=1.0 2024-08-10 19:29:15,866 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 19:29:21,695 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 19:29:21,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=715020.0, ans=0.125 2024-08-10 19:29:35,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=715120.0, ans=0.125 2024-08-10 19:29:40,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2024-08-10 19:29:43,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13550, loss[loss=0.106, beats_loss=0.009635, ecapa_loss=0.0002413, whisper_loss=0.09393, over 16619.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01169, ecapa_loss=0.0002334, whisper_loss=0.09566, over 3910966.84 frames. ], batch size: 64, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:29:50,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=715220.0, ans=22.5 2024-08-10 19:30:14,628 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 24 from Vox, 15 fro AS 2024-08-10 19:30:22,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=715420.0, ans=0.125 2024-08-10 19:30:33,809 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-10 19:30:47,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=715620.0, ans=0.2 2024-08-10 19:30:49,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=715620.0, ans=0.0 2024-08-10 19:30:56,089 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13600, loss[loss=0.1199, beats_loss=0.009013, ecapa_loss=0.0002566, whisper_loss=0.1083, over 22240.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01171, ecapa_loss=0.0002323, whisper_loss=0.09515, over 3885698.15 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:31:07,323 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 19:31:16,749 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 21 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-10 19:31:39,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=715920.0, ans=0.0 2024-08-10 19:31:40,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 3.018e+01 3.345e+01 4.176e+01 9.829e+01, threshold=6.690e+01, percent-clipped=2.0 2024-08-10 19:31:44,883 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 19:31:56,684 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:31:59,665 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 30 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-10 19:32:07,569 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 11 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 19:32:09,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=716120.0, ans=0.0 2024-08-10 19:32:13,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13650, loss[loss=0.1144, beats_loss=0.01039, ecapa_loss=0.0002466, whisper_loss=0.1016, over 22754.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01179, ecapa_loss=0.000231, whisper_loss=0.09479, over 3905671.93 frames. ], batch size: 91, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:32:56,167 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 19:33:10,480 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 19:33:28,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=716720.0, ans=0.1 2024-08-10 19:33:29,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13700, loss[loss=0.1303, beats_loss=0.009914, ecapa_loss=0.0002439, whisper_loss=0.1179, over 16201.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01183, ecapa_loss=0.0002286, whisper_loss=0.09512, over 3906202.38 frames. ], batch size: 64, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:34:02,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=716920.0, ans=0.1 2024-08-10 19:34:09,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=12.0 2024-08-10 19:34:12,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.265e+01 2.823e+01 3.317e+01 3.890e+01 6.067e+01, threshold=6.634e+01, percent-clipped=0.0 2024-08-10 19:34:18,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=6.0 2024-08-10 19:34:19,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717020.0, ans=0.1 2024-08-10 19:34:23,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=717020.0, ans=0.0 2024-08-10 19:34:31,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=717120.0, ans=0.035 2024-08-10 19:34:41,343 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-10 19:34:46,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13750, loss[loss=0.1052, beats_loss=0.01654, ecapa_loss=0.0001516, whisper_loss=0.08718, over 22564.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01184, ecapa_loss=0.0002281, whisper_loss=0.09557, over 3917964.91 frames. ], batch size: 91, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:34:56,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=717220.0, ans=0.125 2024-08-10 19:35:04,543 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-10 19:35:04,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=717320.0, ans=0.0 2024-08-10 19:35:09,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2024-08-10 19:35:15,070 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-10 19:35:16,282 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 19:35:27,996 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 34 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-10 19:35:38,541 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 19:35:47,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-10 19:35:56,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2024-08-10 19:36:02,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13800, loss[loss=0.1027, beats_loss=0.0113, ecapa_loss=0.0002489, whisper_loss=0.08894, over 17497.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01178, ecapa_loss=0.0002278, whisper_loss=0.09547, over 3884305.64 frames. ], batch size: 70, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:36:18,963 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-10 19:36:21,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2024-08-10 19:36:27,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=717820.0, ans=0.0 2024-08-10 19:36:32,776 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-10 19:36:34,485 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-10 19:36:44,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=717920.0, ans=0.0 2024-08-10 19:36:46,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+01 2.754e+01 3.224e+01 3.629e+01 6.153e+01, threshold=6.448e+01, percent-clipped=0.0 2024-08-10 19:37:21,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13850, loss[loss=0.1124, beats_loss=0.01311, ecapa_loss=0.0002048, whisper_loss=0.0972, over 22334.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01177, ecapa_loss=0.0002283, whisper_loss=0.09489, over 3897666.42 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:37:23,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=15.0 2024-08-10 19:37:32,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=718220.0, ans=0.125 2024-08-10 19:37:47,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=718320.0, ans=0.2 2024-08-10 19:37:48,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=718320.0, ans=0.2 2024-08-10 19:38:05,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-10 19:38:06,693 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 19:38:33,195 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 19:38:40,257 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.073e-01 2024-08-10 19:38:41,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13900, loss[loss=0.0945, beats_loss=0.01271, ecapa_loss=0.0002076, whisper_loss=0.07972, over 16112.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01173, ecapa_loss=0.0002302, whisper_loss=0.09504, over 3902952.37 frames. ], batch size: 67, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:38:50,786 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-10 19:38:56,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718820.0, ans=0.1 2024-08-10 19:38:59,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=718820.0, ans=0.0 2024-08-10 19:39:01,368 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-10 19:39:05,380 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 19:39:05,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-10 19:39:17,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=718920.0, ans=0.125 2024-08-10 19:39:24,669 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.171e+01 2.959e+01 3.276e+01 3.717e+01 7.288e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 19:39:25,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=718920.0, ans=0.125 2024-08-10 19:39:35,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-10 19:39:38,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=719020.0, ans=0.0 2024-08-10 19:39:38,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=12.0 2024-08-10 19:39:47,842 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 19:39:49,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=719120.0, ans=0.0 2024-08-10 19:39:57,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 13950, loss[loss=0.1086, beats_loss=0.01048, ecapa_loss=0.000184, whisper_loss=0.09625, over 14652.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01159, ecapa_loss=0.0002314, whisper_loss=0.09555, over 3897329.22 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:40:11,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719320.0, ans=0.1 2024-08-10 19:40:17,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=719320.0, ans=0.2 2024-08-10 19:40:20,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=719320.0, ans=0.125 2024-08-10 19:40:34,068 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 19:40:34,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.81 vs. limit=15.0 2024-08-10 19:40:35,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=719420.0, ans=0.125 2024-08-10 19:40:44,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=719520.0, ans=0.125 2024-08-10 19:40:53,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2024-08-10 19:41:02,980 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.010e+00 2024-08-10 19:41:13,739 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14000, loss[loss=0.1123, beats_loss=0.01268, ecapa_loss=0.0002158, whisper_loss=0.09744, over 22508.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01157, ecapa_loss=0.0002309, whisper_loss=0.0961, over 3922870.83 frames. ], batch size: 88, lr: 1.18e-02, grad_scale: 1099511627776.0 2024-08-10 19:41:55,844 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-72000.pt 2024-08-10 19:41:59,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.816e+01 3.391e+01 3.815e+01 6.287e+01, threshold=6.783e+01, percent-clipped=0.0 2024-08-10 19:42:01,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-10 19:42:11,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=720020.0, ans=0.125 2024-08-10 19:42:13,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=720020.0, ans=0.0 2024-08-10 19:42:34,382 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14050, loss[loss=0.1279, beats_loss=0.009312, ecapa_loss=0.0002655, whisper_loss=0.1159, over 22712.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01162, ecapa_loss=0.0002306, whisper_loss=0.09621, over 3954464.88 frames. ], batch size: 89, lr: 1.18e-02, grad_scale: 2199023255552.0 2024-08-10 19:42:35,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=22.5 2024-08-10 19:42:35,692 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-10 19:42:53,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=720320.0, ans=0.125 2024-08-10 19:43:02,072 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-10 19:43:03,912 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 19:43:12,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-10 19:43:12,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=720420.0, ans=15.0 2024-08-10 19:43:51,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14100, loss[loss=0.1192, beats_loss=0.01034, ecapa_loss=0.0002129, whisper_loss=0.1067, over 15673.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01159, ecapa_loss=0.0002317, whisper_loss=0.09608, over 3896776.28 frames. ], batch size: 61, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:43:53,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=720720.0, ans=0.125 2024-08-10 19:43:54,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=720720.0, ans=0.125 2024-08-10 19:44:25,706 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-10 19:44:27,146 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 19:44:32,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.752e+01 3.141e+01 3.762e+01 7.016e+01, threshold=6.282e+01, percent-clipped=2.0 2024-08-10 19:44:38,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=721020.0, ans=0.125 2024-08-10 19:44:45,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=721020.0, ans=0.2 2024-08-10 19:44:56,490 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 19:45:00,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=721120.0, ans=0.125 2024-08-10 19:45:06,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14150, loss[loss=0.07966, beats_loss=0.009142, ecapa_loss=0.0002415, whisper_loss=0.06811, over 14467.00 frames. ], tot_loss[loss=0.1101, beats_loss=0.01164, ecapa_loss=0.0002314, whisper_loss=0.09613, over 3878817.75 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:45:14,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=721220.0, ans=0.0 2024-08-10 19:45:18,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2024-08-10 19:45:23,027 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 19:45:25,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=721320.0, ans=0.125 2024-08-10 19:45:30,915 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:45:31,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=12.0 2024-08-10 19:45:39,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.83 vs. limit=12.0 2024-08-10 19:45:41,028 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 19:45:45,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=721420.0, ans=0.125 2024-08-10 19:45:48,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=721420.0, ans=0.2 2024-08-10 19:45:56,136 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 19:45:56,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=721520.0, ans=0.1 2024-08-10 19:45:59,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721520.0, ans=0.1 2024-08-10 19:46:03,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=721520.0, ans=0.95 2024-08-10 19:46:20,315 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-10 19:46:21,283 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14200, loss[loss=0.1265, beats_loss=0.009444, ecapa_loss=0.000238, whisper_loss=0.1147, over 23423.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01169, ecapa_loss=0.0002309, whisper_loss=0.09531, over 3858257.62 frames. ], batch size: 91, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:46:43,058 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 19:46:51,170 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-10 19:47:03,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721920.0, ans=0.1 2024-08-10 19:47:04,058 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+01 2.830e+01 3.191e+01 3.752e+01 5.497e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-10 19:47:04,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-08-10 19:47:24,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=722120.0, ans=0.0 2024-08-10 19:47:38,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14250, loss[loss=0.1236, beats_loss=0.008619, ecapa_loss=0.0002852, whisper_loss=0.1121, over 23025.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01174, ecapa_loss=0.0002311, whisper_loss=0.09455, over 3861495.77 frames. ], batch size: 94, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:47:48,695 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-10 19:47:52,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2024-08-10 19:47:57,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722320.0, ans=0.1 2024-08-10 19:47:58,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=722320.0, ans=0.0 2024-08-10 19:48:01,670 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-10 19:48:08,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=722320.0, ans=10.0 2024-08-10 19:48:23,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=722420.0, ans=0.0 2024-08-10 19:48:25,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=722520.0, ans=0.0 2024-08-10 19:48:26,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=15.0 2024-08-10 19:48:34,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-10 19:48:50,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-08-10 19:48:50,879 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-10 19:48:56,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14300, loss[loss=0.1022, beats_loss=0.01394, ecapa_loss=0.0001862, whisper_loss=0.08644, over 18653.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01173, ecapa_loss=0.0002302, whisper_loss=0.09478, over 3861598.37 frames. ], batch size: 73, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:49:12,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=722820.0, ans=0.125 2024-08-10 19:49:13,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2024-08-10 19:49:22,246 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-10 19:49:26,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=722920.0, ans=0.125 2024-08-10 19:49:32,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=722920.0, ans=0.125 2024-08-10 19:49:32,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=722920.0, ans=0.0 2024-08-10 19:49:40,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.839e+01 3.149e+01 3.823e+01 7.710e+01, threshold=6.298e+01, percent-clipped=1.0 2024-08-10 19:49:42,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=723020.0, ans=0.2 2024-08-10 19:49:51,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-10 19:50:15,322 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14350, loss[loss=0.1403, beats_loss=0.01, ecapa_loss=0.0001745, whisper_loss=0.1286, over 24815.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01171, ecapa_loss=0.0002305, whisper_loss=0.0947, over 3871126.26 frames. ], batch size: 89, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:50:38,686 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 35 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-10 19:50:48,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2024-08-10 19:50:57,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=723420.0, ans=0.2 2024-08-10 19:50:58,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=723520.0, ans=0.125 2024-08-10 19:51:05,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723520.0, ans=0.1 2024-08-10 19:51:11,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2024-08-10 19:51:20,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=723620.0, ans=0.125 2024-08-10 19:51:30,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14400, loss[loss=0.1297, beats_loss=0.008927, ecapa_loss=0.0002264, whisper_loss=0.1186, over 15302.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01176, ecapa_loss=0.0002306, whisper_loss=0.09492, over 3895753.04 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:51:45,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-10 19:51:54,961 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 19:51:55,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723820.0, ans=0.1 2024-08-10 19:52:06,963 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 19:52:11,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.768e+01 3.038e+01 3.446e+01 5.868e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-10 19:52:32,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=724120.0, ans=0.09899494936611666 2024-08-10 19:52:39,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.89 vs. limit=22.5 2024-08-10 19:52:47,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 5, batch 14450, loss[loss=0.1074, beats_loss=0.009937, ecapa_loss=0.0002712, whisper_loss=0.09474, over 13608.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01177, ecapa_loss=0.0002316, whisper_loss=0.09506, over 3898158.52 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 2199023255552.0 2024-08-10 19:52:52,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=724220.0, ans=0.2 2024-08-10 19:52:56,589 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 19:53:02,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=724320.0, ans=0.125 2024-08-10 19:53:20,107 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 19:53:22,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=724420.0, ans=0.125 2024-08-10 19:53:31,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=724420.0, ans=0.0 2024-08-10 19:53:49,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2024-08-10 19:53:54,857 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-5.pt 2024-08-10 19:54:34,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 0, loss[loss=0.09552, beats_loss=0.01111, ecapa_loss=0.0002648, whisper_loss=0.08176, over 22110.00 frames. ], tot_loss[loss=0.09552, beats_loss=0.01111, ecapa_loss=0.0002648, whisper_loss=0.08176, over 22110.00 frames. ], batch size: 92, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:54:34,068 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 19:55:10,683 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on ASR_libri: loss=0.2614, beats_loss=0, ecapa_loss=0.0007237, whisper_loss=0.2541, over 922467.00 frames. 2024-08-10 19:55:26,916 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on SV_voxceleb1: loss=0.006205, beats_loss=0, ecapa_loss=0.0006205, whisper_loss=0, over 939242.00 frames. 2024-08-10 19:57:12,758 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on AT_audioset: loss=0.02628, beats_loss=0.02628, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 19:57:12,761 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 19:57:14,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724650.0, ans=0.125 2024-08-10 19:57:17,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2024-08-10 19:58:00,324 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-10 19:58:17,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=724850.0, ans=0.125 2024-08-10 19:58:20,428 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-10 19:58:25,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=724850.0, ans=0.125 2024-08-10 19:58:35,670 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 19:58:40,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+01 3.034e+01 3.419e+01 4.003e+01 7.099e+01, threshold=6.838e+01, percent-clipped=1.0 2024-08-10 19:58:44,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2024-08-10 19:59:15,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 50, loss[loss=0.1076, beats_loss=0.01286, ecapa_loss=0.000233, whisper_loss=0.09243, over 21306.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01187, ecapa_loss=0.0002413, whisper_loss=0.09078, over 879409.13 frames. ], batch size: 87, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 19:59:15,421 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 19:59:25,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=725150.0, ans=0.0 2024-08-10 19:59:25,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=725150.0, ans=0.0 2024-08-10 19:59:39,657 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 19:59:47,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=725250.0, ans=0.125 2024-08-10 20:00:18,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.21 vs. limit=10.0 2024-08-10 20:01:02,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-08-10 20:01:09,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 100, loss[loss=0.09617, beats_loss=0.009616, ecapa_loss=0.0002424, whisper_loss=0.08413, over 15526.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01167, ecapa_loss=0.0002312, whisper_loss=0.09257, over 1537307.72 frames. ], batch size: 59, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:01:55,024 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 20:02:20,161 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 20:02:26,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=725950.0, ans=0.125 2024-08-10 20:02:28,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.454e+01 2.882e+01 3.222e+01 3.754e+01 5.300e+01, threshold=6.444e+01, percent-clipped=0.0 2024-08-10 20:02:34,215 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 20:02:44,752 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 20:02:58,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 150, loss[loss=0.1049, beats_loss=0.009539, ecapa_loss=0.0002872, whisper_loss=0.0925, over 19673.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01154, ecapa_loss=0.0002298, whisper_loss=0.09317, over 2044381.05 frames. ], batch size: 80, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:03:05,641 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-10 20:03:09,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-10 20:03:32,590 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 20:03:34,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726350.0, ans=0.0 2024-08-10 20:04:17,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.70 vs. limit=22.5 2024-08-10 20:04:22,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 200, loss[loss=0.11, beats_loss=0.01293, ecapa_loss=0.0002156, whisper_loss=0.09494, over 19644.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01132, ecapa_loss=0.000229, whisper_loss=0.0948, over 2438653.46 frames. ], batch size: 77, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:04:27,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=726650.0, ans=0.2 2024-08-10 20:04:31,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=726650.0, ans=0.125 2024-08-10 20:04:52,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=726750.0, ans=0.125 2024-08-10 20:05:00,343 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 20:05:02,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=726850.0, ans=0.1 2024-08-10 20:05:10,509 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 32 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-10 20:05:18,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=726950.0, ans=0.015 2024-08-10 20:05:19,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.639e+01 2.951e+01 3.334e+01 6.571e+01, threshold=5.903e+01, percent-clipped=1.0 2024-08-10 20:05:31,585 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 20:05:39,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=727050.0, ans=0.125 2024-08-10 20:05:41,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 250, loss[loss=0.09866, beats_loss=0.01174, ecapa_loss=0.0002412, whisper_loss=0.08451, over 18179.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01135, ecapa_loss=0.0002277, whisper_loss=0.09464, over 2734065.17 frames. ], batch size: 74, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:05:41,810 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 26 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-10 20:05:48,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2024-08-10 20:06:14,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=727350.0, ans=0.125 2024-08-10 20:06:21,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=727350.0, ans=0.125 2024-08-10 20:06:34,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=727450.0, ans=0.125 2024-08-10 20:06:35,762 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-10 20:06:53,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2024-08-10 20:06:53,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 300, loss[loss=0.0853, beats_loss=0.01331, ecapa_loss=0.0002615, whisper_loss=0.06938, over 13652.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01143, ecapa_loss=0.0002264, whisper_loss=0.09413, over 2949564.60 frames. ], batch size: 59, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:07:02,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=727650.0, ans=0.125 2024-08-10 20:07:10,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727750.0, ans=0.1 2024-08-10 20:07:11,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.88 vs. limit=22.5 2024-08-10 20:07:12,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=727750.0, ans=0.125 2024-08-10 20:07:27,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-08-10 20:07:33,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=727850.0, ans=0.2 2024-08-10 20:07:44,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=727950.0, ans=0.05 2024-08-10 20:07:45,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.770e+01 3.156e+01 3.793e+01 6.617e+01, threshold=6.313e+01, percent-clipped=1.0 2024-08-10 20:07:45,679 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 20:07:49,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=727950.0, ans=0.0 2024-08-10 20:07:52,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2024-08-10 20:08:07,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-08-10 20:08:07,942 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 350, loss[loss=0.09753, beats_loss=0.01163, ecapa_loss=0.0002162, whisper_loss=0.08373, over 20646.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01143, ecapa_loss=0.0002246, whisper_loss=0.09417, over 3141459.63 frames. ], batch size: 81, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:08:08,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=728150.0, ans=0.05 2024-08-10 20:08:10,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=728150.0, ans=0.125 2024-08-10 20:08:12,707 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 20:08:18,366 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-10 20:08:36,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-08-10 20:08:44,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-10 20:08:47,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=728350.0, ans=0.125 2024-08-10 20:08:58,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728450.0, ans=0.1 2024-08-10 20:09:08,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=728550.0, ans=0.125 2024-08-10 20:09:12,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=728550.0, ans=0.05 2024-08-10 20:09:21,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 400, loss[loss=0.1049, beats_loss=0.01143, ecapa_loss=0.0002094, whisper_loss=0.09133, over 21736.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01148, ecapa_loss=0.0002239, whisper_loss=0.09349, over 3270616.64 frames. ], batch size: 88, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:09:28,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=728650.0, ans=0.125 2024-08-10 20:10:03,669 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-10 20:10:12,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.814e+01 3.145e+01 3.714e+01 1.358e+02, threshold=6.291e+01, percent-clipped=2.0 2024-08-10 20:10:20,673 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-10 20:10:32,427 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 20:10:33,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 450, loss[loss=0.09389, beats_loss=0.0122, ecapa_loss=0.0002328, whisper_loss=0.07936, over 14638.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.000224, whisper_loss=0.09318, over 3360868.76 frames. ], batch size: 58, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:10:59,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=729250.0, ans=0.05 2024-08-10 20:11:04,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729350.0, ans=0.1 2024-08-10 20:11:10,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2024-08-10 20:11:32,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=729550.0, ans=0.125 2024-08-10 20:11:34,297 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-10 20:11:38,438 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 20:11:47,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 500, loss[loss=0.08925, beats_loss=0.01127, ecapa_loss=0.0002793, whisper_loss=0.07519, over 14890.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01143, ecapa_loss=0.0002247, whisper_loss=0.09376, over 3496708.67 frames. ], batch size: 61, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:12:11,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.73 vs. limit=10.0 2024-08-10 20:12:15,130 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-10 20:12:41,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=729950.0, ans=0.125 2024-08-10 20:12:41,810 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.311e+01 2.724e+01 3.066e+01 3.405e+01 6.797e+01, threshold=6.131e+01, percent-clipped=1.0 2024-08-10 20:12:43,256 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-10 20:13:02,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 550, loss[loss=0.1003, beats_loss=0.01202, ecapa_loss=0.0002172, whisper_loss=0.08615, over 21809.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0114, ecapa_loss=0.0002244, whisper_loss=0.09384, over 3581442.02 frames. ], batch size: 86, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:13:04,529 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-10 20:13:10,683 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-10 20:13:13,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=730150.0, ans=0.2 2024-08-10 20:13:20,751 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 20:13:20,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730250.0, ans=0.1 2024-08-10 20:13:33,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=730350.0, ans=0.0 2024-08-10 20:13:37,016 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-10 20:13:42,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=730350.0, ans=0.125 2024-08-10 20:14:03,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=730450.0, ans=0.0 2024-08-10 20:14:26,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=730550.0, ans=0.125 2024-08-10 20:14:29,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.34 vs. limit=22.5 2024-08-10 20:14:34,847 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 20:14:42,507 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 600, loss[loss=0.1127, beats_loss=0.008808, ecapa_loss=0.00027, whisper_loss=0.1012, over 21543.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01136, ecapa_loss=0.0002217, whisper_loss=0.09485, over 3668099.25 frames. ], batch size: 88, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:14:50,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730650.0, ans=0.1 2024-08-10 20:15:05,405 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-10 20:15:07,239 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-10 20:15:16,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.76 vs. limit=22.5 2024-08-10 20:15:36,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=730950.0, ans=0.0 2024-08-10 20:15:37,498 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.552e+01 2.834e+01 3.243e+01 4.859e+01, threshold=5.668e+01, percent-clipped=0.0 2024-08-10 20:15:47,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-10 20:16:03,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=731050.0, ans=0.0 2024-08-10 20:16:08,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 650, loss[loss=0.1077, beats_loss=0.01286, ecapa_loss=0.0001784, whisper_loss=0.0931, over 17304.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01145, ecapa_loss=0.0002211, whisper_loss=0.09477, over 3682661.35 frames. ], batch size: 67, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:16:10,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=731150.0, ans=0.125 2024-08-10 20:16:19,062 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 20:16:58,748 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 20:17:21,615 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-10 20:17:32,852 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-10 20:17:39,771 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-10 20:17:49,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-08-10 20:17:50,840 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 700, loss[loss=0.1099, beats_loss=0.01142, ecapa_loss=0.00025, whisper_loss=0.09601, over 17290.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01151, ecapa_loss=0.0002201, whisper_loss=0.09474, over 3721656.01 frames. ], batch size: 69, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:18:33,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=731750.0, ans=0.0 2024-08-10 20:18:35,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2024-08-10 20:18:39,800 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 20:18:47,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=731850.0, ans=0.125 2024-08-10 20:19:15,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.234e+01 2.666e+01 3.015e+01 3.385e+01 4.873e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 20:19:49,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 750, loss[loss=0.1185, beats_loss=0.01242, ecapa_loss=0.0001817, whisper_loss=0.1042, over 17669.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01159, ecapa_loss=0.0002176, whisper_loss=0.09451, over 3715267.89 frames. ], batch size: 67, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:19:57,161 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 20:20:06,472 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 20:20:22,163 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-10 20:20:33,174 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 20:20:43,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=732350.0, ans=0.125 2024-08-10 20:20:48,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-10 20:21:41,050 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-10 20:21:48,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 800, loss[loss=0.1179, beats_loss=0.01235, ecapa_loss=0.000206, whisper_loss=0.1035, over 14547.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01165, ecapa_loss=0.0002163, whisper_loss=0.0939, over 3739190.99 frames. ], batch size: 57, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:21:48,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=732650.0, ans=0.2 2024-08-10 20:22:15,643 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 30 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 20:22:15,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=732750.0, ans=0.09899494936611666 2024-08-10 20:22:33,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=732850.0, ans=0.125 2024-08-10 20:22:38,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-10 20:22:38,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2024-08-10 20:23:10,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-10 20:23:13,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.177e+01 2.807e+01 3.275e+01 3.755e+01 8.468e+01, threshold=6.551e+01, percent-clipped=2.0 2024-08-10 20:23:43,188 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 850, loss[loss=0.1284, beats_loss=0.009831, ecapa_loss=0.0002227, whisper_loss=0.1163, over 18522.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01164, ecapa_loss=0.0002155, whisper_loss=0.09298, over 3757926.86 frames. ], batch size: 69, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:23:51,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.46 vs. limit=22.5 2024-08-10 20:23:57,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=733150.0, ans=0.0 2024-08-10 20:24:33,022 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-10 20:24:39,831 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-10 20:24:41,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=733450.0, ans=0.0 2024-08-10 20:24:44,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=15.0 2024-08-10 20:24:51,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=733550.0, ans=0.125 2024-08-10 20:25:09,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 900, loss[loss=0.1161, beats_loss=0.01293, ecapa_loss=0.0002003, whisper_loss=0.1012, over 15597.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01158, ecapa_loss=0.0002159, whisper_loss=0.09345, over 3780805.17 frames. ], batch size: 62, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:25:29,326 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 20:25:43,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=733750.0, ans=0.125 2024-08-10 20:26:01,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=733850.0, ans=0.125 2024-08-10 20:26:12,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.731e+01 3.012e+01 3.536e+01 7.102e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-10 20:26:30,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734050.0, ans=0.125 2024-08-10 20:26:36,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2024-08-10 20:26:38,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 950, loss[loss=0.1118, beats_loss=0.01023, ecapa_loss=0.0002187, whisper_loss=0.09938, over 14693.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01157, ecapa_loss=0.0002146, whisper_loss=0.09382, over 3786062.59 frames. ], batch size: 56, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:26:42,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=734150.0, ans=0.125 2024-08-10 20:26:54,415 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-10 20:27:19,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=734350.0, ans=0.2 2024-08-10 20:27:28,824 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 20:27:36,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=734450.0, ans=0.125 2024-08-10 20:27:36,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734450.0, ans=0.125 2024-08-10 20:27:39,178 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:27:41,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=734450.0, ans=0.0 2024-08-10 20:27:47,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=734550.0, ans=0.04949747468305833 2024-08-10 20:27:53,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=734550.0, ans=0.0 2024-08-10 20:27:57,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=734550.0, ans=0.02 2024-08-10 20:27:58,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734550.0, ans=0.1 2024-08-10 20:28:01,509 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1000, loss[loss=0.1173, beats_loss=0.01407, ecapa_loss=0.0001775, whisper_loss=0.1014, over 16428.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01158, ecapa_loss=0.0002143, whisper_loss=0.09364, over 3792440.32 frames. ], batch size: 66, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:28:07,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.59 vs. limit=22.5 2024-08-10 20:28:10,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=734650.0, ans=0.0 2024-08-10 20:28:49,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734850.0, ans=0.1 2024-08-10 20:28:50,483 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 20:28:50,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=734950.0, ans=0.125 2024-08-10 20:28:52,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=734950.0, ans=0.125 2024-08-10 20:28:55,275 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 20:29:00,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.726e+01 3.092e+01 3.601e+01 1.041e+02, threshold=6.184e+01, percent-clipped=1.0 2024-08-10 20:29:01,861 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-10 20:29:05,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734950.0, ans=0.1 2024-08-10 20:29:25,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1050, loss[loss=0.08841, beats_loss=0.01202, ecapa_loss=0.0001856, whisper_loss=0.07453, over 19169.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01163, ecapa_loss=0.0002148, whisper_loss=0.09268, over 3803587.07 frames. ], batch size: 77, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:29:48,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735250.0, ans=0.1 2024-08-10 20:29:54,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=735250.0, ans=0.0 2024-08-10 20:30:21,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2024-08-10 20:30:32,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=735450.0, ans=0.125 2024-08-10 20:30:39,862 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-10 20:30:43,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-10 20:30:45,135 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-10 20:30:48,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=735550.0, ans=0.05 2024-08-10 20:30:48,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=12.0 2024-08-10 20:30:51,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1100, loss[loss=0.1041, beats_loss=0.014, ecapa_loss=0.0001601, whisper_loss=0.08853, over 22905.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01162, ecapa_loss=0.0002145, whisper_loss=0.09343, over 3815727.14 frames. ], batch size: 90, lr: 1.09e-02, grad_scale: 2199023255552.0 2024-08-10 20:31:01,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=735650.0, ans=0.125 2024-08-10 20:31:02,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=735650.0, ans=0.125 2024-08-10 20:31:34,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=735850.0, ans=0.125 2024-08-10 20:31:39,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=735850.0, ans=0.125 2024-08-10 20:31:43,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=735950.0, ans=0.0 2024-08-10 20:31:50,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.214e+01 2.770e+01 3.006e+01 3.661e+01 6.910e+01, threshold=6.012e+01, percent-clipped=1.0 2024-08-10 20:32:00,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=736050.0, ans=0.5 2024-08-10 20:32:05,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-10 20:32:10,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=736050.0, ans=0.2 2024-08-10 20:32:15,953 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1150, loss[loss=0.1034, beats_loss=0.01276, ecapa_loss=0.0001809, whisper_loss=0.08882, over 16428.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01161, ecapa_loss=0.0002147, whisper_loss=0.09292, over 3814910.49 frames. ], batch size: 63, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:32:18,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.17 vs. limit=10.0 2024-08-10 20:32:21,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736150.0, ans=0.1 2024-08-10 20:32:36,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=736250.0, ans=0.125 2024-08-10 20:32:53,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=736350.0, ans=0.125 2024-08-10 20:32:55,395 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-10 20:33:31,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=736550.0, ans=0.05 2024-08-10 20:33:39,683 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1200, loss[loss=0.08921, beats_loss=0.01289, ecapa_loss=0.0002424, whisper_loss=0.07389, over 17544.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01171, ecapa_loss=0.0002143, whisper_loss=0.09275, over 3820667.15 frames. ], batch size: 73, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:33:42,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=736650.0, ans=0.2 2024-08-10 20:33:52,623 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-10 20:33:53,926 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-10 20:33:55,559 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 20:34:04,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=736750.0, ans=0.125 2024-08-10 20:34:15,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=736850.0, ans=0.05 2024-08-10 20:34:19,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=15.0 2024-08-10 20:34:21,979 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 20:34:28,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736950.0, ans=0.1 2024-08-10 20:34:33,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.774e+01 3.140e+01 3.554e+01 5.402e+01, threshold=6.279e+01, percent-clipped=0.0 2024-08-10 20:34:57,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1250, loss[loss=0.1044, beats_loss=0.01164, ecapa_loss=0.000188, whisper_loss=0.09083, over 17702.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01173, ecapa_loss=0.0002145, whisper_loss=0.09316, over 3838241.51 frames. ], batch size: 68, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:35:01,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=737150.0, ans=0.0 2024-08-10 20:35:29,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=737350.0, ans=0.125 2024-08-10 20:35:34,887 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 20:35:40,901 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-10 20:35:44,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=737450.0, ans=0.125 2024-08-10 20:36:11,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2024-08-10 20:36:12,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1300, loss[loss=0.09804, beats_loss=0.01182, ecapa_loss=0.0001787, whisper_loss=0.08443, over 15829.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0116, ecapa_loss=0.0002162, whisper_loss=0.09368, over 3844086.10 frames. ], batch size: 61, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:36:21,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=737650.0, ans=0.0 2024-08-10 20:36:21,966 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 20:36:27,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737750.0, ans=0.125 2024-08-10 20:36:33,655 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:36:46,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=737850.0, ans=0.2 2024-08-10 20:37:05,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737950.0, ans=0.1 2024-08-10 20:37:08,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.805e+01 3.070e+01 3.591e+01 5.506e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-10 20:37:09,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737950.0, ans=0.1 2024-08-10 20:37:11,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=737950.0, ans=0.0 2024-08-10 20:37:24,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=738050.0, ans=0.2 2024-08-10 20:37:34,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1350, loss[loss=0.1099, beats_loss=0.01038, ecapa_loss=0.000189, whisper_loss=0.09767, over 18376.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01165, ecapa_loss=0.0002142, whisper_loss=0.09393, over 3858463.22 frames. ], batch size: 70, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:37:40,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=738150.0, ans=0.0 2024-08-10 20:37:42,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=738150.0, ans=0.0 2024-08-10 20:37:54,420 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 20:38:07,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.49 vs. limit=10.0 2024-08-10 20:38:08,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738350.0, ans=0.1 2024-08-10 20:38:08,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=738350.0, ans=0.0 2024-08-10 20:38:17,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.75 vs. limit=15.0 2024-08-10 20:38:32,957 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-10 20:38:33,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=738450.0, ans=0.125 2024-08-10 20:38:35,664 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-10 20:38:39,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=738550.0, ans=0.07 2024-08-10 20:38:50,194 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 20:38:52,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-08-10 20:38:56,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1400, loss[loss=0.106, beats_loss=0.0118, ecapa_loss=0.0001772, whisper_loss=0.09247, over 14833.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01155, ecapa_loss=0.0002145, whisper_loss=0.09373, over 3817955.98 frames. ], batch size: 55, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:38:59,939 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 20:39:12,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=738750.0, ans=0.125 2024-08-10 20:39:28,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2024-08-10 20:39:32,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=738850.0, ans=0.125 2024-08-10 20:39:56,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.641e+01 2.966e+01 3.393e+01 5.160e+01, threshold=5.932e+01, percent-clipped=0.0 2024-08-10 20:40:23,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1450, loss[loss=0.09004, beats_loss=0.011, ecapa_loss=0.00016, whisper_loss=0.07744, over 14581.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01155, ecapa_loss=0.0002133, whisper_loss=0.09275, over 3799731.86 frames. ], batch size: 56, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:40:57,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=739150.0, ans=0.125 2024-08-10 20:41:17,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=739250.0, ans=0.2 2024-08-10 20:42:07,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=739550.0, ans=0.2 2024-08-10 20:42:17,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739650.0, ans=0.1 2024-08-10 20:42:18,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1500, loss[loss=0.1193, beats_loss=0.009018, ecapa_loss=0.000213, whisper_loss=0.1082, over 23038.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01157, ecapa_loss=0.0002132, whisper_loss=0.09166, over 3823262.28 frames. ], batch size: 89, lr: 1.08e-02, grad_scale: 2199023255552.0 2024-08-10 20:42:36,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=15.0 2024-08-10 20:42:49,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=739850.0, ans=0.0 2024-08-10 20:43:01,468 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-10 20:43:03,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=739850.0, ans=0.05 2024-08-10 20:43:05,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=739950.0, ans=0.0 2024-08-10 20:43:06,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=739950.0, ans=0.0 2024-08-10 20:43:14,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.729e+01 3.073e+01 3.413e+01 6.253e+01, threshold=6.146e+01, percent-clipped=1.0 2024-08-10 20:43:32,337 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 20:43:38,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1550, loss[loss=0.1267, beats_loss=0.01062, ecapa_loss=0.0001891, whisper_loss=0.1142, over 23115.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01159, ecapa_loss=0.0002117, whisper_loss=0.09262, over 3842494.00 frames. ], batch size: 88, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:43:56,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=740250.0, ans=0.1 2024-08-10 20:44:23,703 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 20:44:26,729 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-10 20:44:26,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=740450.0, ans=0.125 2024-08-10 20:44:47,639 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 20:44:52,604 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-10 20:44:58,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-10 20:45:00,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1600, loss[loss=0.1042, beats_loss=0.01221, ecapa_loss=0.0002133, whisper_loss=0.08982, over 16116.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0116, ecapa_loss=0.0002114, whisper_loss=0.0932, over 3868149.77 frames. ], batch size: 64, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:45:06,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=740650.0, ans=0.0 2024-08-10 20:45:16,379 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 20:45:21,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=740750.0, ans=0.125 2024-08-10 20:45:36,394 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-10 20:45:58,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.574e+01 2.929e+01 3.457e+01 5.264e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-10 20:46:23,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1650, loss[loss=0.1126, beats_loss=0.008494, ecapa_loss=0.0002388, whisper_loss=0.1017, over 14268.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01158, ecapa_loss=0.000211, whisper_loss=0.09389, over 3858929.63 frames. ], batch size: 54, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:47:00,666 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 20:47:03,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741350.0, ans=0.125 2024-08-10 20:47:19,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=741450.0, ans=0.125 2024-08-10 20:47:40,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1700, loss[loss=0.1147, beats_loss=0.009829, ecapa_loss=0.0001784, whisper_loss=0.1031, over 18883.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01157, ecapa_loss=0.0002121, whisper_loss=0.09365, over 3831056.84 frames. ], batch size: 71, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:47:55,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2024-08-10 20:47:56,252 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 20:47:58,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-08-10 20:48:02,587 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 20:48:06,851 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-10 20:48:21,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2024-08-10 20:48:28,243 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-10 20:48:33,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=741950.0, ans=0.025 2024-08-10 20:48:34,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.737e+01 3.042e+01 3.583e+01 5.597e+01, threshold=6.084e+01, percent-clipped=0.0 2024-08-10 20:48:35,767 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-10 20:48:49,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=742050.0, ans=0.2 2024-08-10 20:48:51,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=742050.0, ans=0.0 2024-08-10 20:48:56,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1750, loss[loss=0.08487, beats_loss=0.01313, ecapa_loss=0.0002181, whisper_loss=0.06956, over 21039.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01156, ecapa_loss=0.0002124, whisper_loss=0.09314, over 3854021.85 frames. ], batch size: 88, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:48:56,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=742150.0, ans=0.0 2024-08-10 20:49:14,482 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-10 20:49:19,023 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-10 20:49:21,192 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-10 20:49:22,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=742250.0, ans=0.125 2024-08-10 20:49:35,584 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 16 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-10 20:49:44,898 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 20:49:45,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=742450.0, ans=0.07 2024-08-10 20:49:46,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=742450.0, ans=0.125 2024-08-10 20:49:55,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742550.0, ans=0.1 2024-08-10 20:50:00,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-10 20:50:06,030 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 12 from Vox, 50 fro AS 2024-08-10 20:50:11,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1800, loss[loss=0.1127, beats_loss=0.01057, ecapa_loss=0.0001772, whisper_loss=0.1004, over 23619.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01151, ecapa_loss=0.0002113, whisper_loss=0.09386, over 3845015.79 frames. ], batch size: 90, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:50:25,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=742750.0, ans=10.0 2024-08-10 20:50:37,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=742750.0, ans=0.125 2024-08-10 20:50:39,123 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 20:51:03,023 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 20:51:05,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.668e+01 3.016e+01 3.512e+01 6.004e+01, threshold=6.033e+01, percent-clipped=0.0 2024-08-10 20:51:06,111 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-10 20:51:14,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=743050.0, ans=0.125 2024-08-10 20:51:29,694 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1850, loss[loss=0.1188, beats_loss=0.01146, ecapa_loss=0.0002017, whisper_loss=0.1054, over 23029.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0115, ecapa_loss=0.0002124, whisper_loss=0.09388, over 3862671.18 frames. ], batch size: 91, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:51:33,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743150.0, ans=0.125 2024-08-10 20:51:49,071 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-10 20:51:49,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743250.0, ans=0.1 2024-08-10 20:52:03,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=743350.0, ans=0.125 2024-08-10 20:52:15,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=743450.0, ans=0.125 2024-08-10 20:52:32,067 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 20:52:34,086 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-10 20:52:38,337 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-10 20:52:40,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=743550.0, ans=0.125 2024-08-10 20:52:44,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1900, loss[loss=0.1013, beats_loss=0.01299, ecapa_loss=0.0001724, whisper_loss=0.08657, over 17963.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01151, ecapa_loss=0.0002133, whisper_loss=0.09336, over 3821212.47 frames. ], batch size: 69, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:52:46,071 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-10 20:52:51,823 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 20:52:53,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=743650.0, ans=0.125 2024-08-10 20:53:05,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2024-08-10 20:53:08,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=743750.0, ans=0.2 2024-08-10 20:53:15,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=743850.0, ans=0.0 2024-08-10 20:53:16,277 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-10 20:53:17,730 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-10 20:53:36,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.690e+01 3.145e+01 3.666e+01 6.863e+01, threshold=6.290e+01, percent-clipped=1.0 2024-08-10 20:53:51,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.52 vs. limit=22.5 2024-08-10 20:54:01,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 1950, loss[loss=0.09747, beats_loss=0.0125, ecapa_loss=0.0002355, whisper_loss=0.08261, over 17899.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01154, ecapa_loss=0.0002146, whisper_loss=0.09284, over 3812843.69 frames. ], batch size: 71, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:54:24,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744250.0, ans=0.125 2024-08-10 20:54:27,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=744250.0, ans=0.125 2024-08-10 20:54:30,854 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-10 20:54:31,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2024-08-10 20:54:34,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=744350.0, ans=0.125 2024-08-10 20:54:54,960 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-10 20:55:00,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=744450.0, ans=0.95 2024-08-10 20:55:05,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=744550.0, ans=0.125 2024-08-10 20:55:18,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2000, loss[loss=0.1111, beats_loss=0.01029, ecapa_loss=0.000233, whisper_loss=0.09849, over 16797.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01145, ecapa_loss=0.0002178, whisper_loss=0.09329, over 3802822.24 frames. ], batch size: 64, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:56:00,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=744850.0, ans=0.125 2024-08-10 20:56:15,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=744950.0, ans=0.2 2024-08-10 20:56:16,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.753e+01 3.103e+01 3.441e+01 5.353e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-10 20:56:22,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745050.0, ans=0.1 2024-08-10 20:56:31,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-10 20:56:33,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745050.0, ans=0.0 2024-08-10 20:56:42,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2050, loss[loss=0.1082, beats_loss=0.01106, ecapa_loss=0.0002023, whisper_loss=0.0951, over 22465.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01149, ecapa_loss=0.0002188, whisper_loss=0.09369, over 3814719.25 frames. ], batch size: 88, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:56:46,596 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 20:56:46,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=745150.0, ans=0.125 2024-08-10 20:56:57,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=745250.0, ans=0.125 2024-08-10 20:57:01,170 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 20:57:06,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-10 20:57:08,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-10 20:57:38,453 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 20:57:51,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=745550.0, ans=0.125 2024-08-10 20:57:51,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=745550.0, ans=0.0 2024-08-10 20:57:55,097 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-10 20:58:00,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=745550.0, ans=0.125 2024-08-10 20:58:02,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2100, loss[loss=0.1031, beats_loss=0.01372, ecapa_loss=0.000175, whisper_loss=0.08766, over 17136.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01161, ecapa_loss=0.0002178, whisper_loss=0.09292, over 3807393.80 frames. ], batch size: 65, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:58:14,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=745650.0, ans=0.0 2024-08-10 20:58:23,629 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 20:58:44,891 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-10 20:59:06,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.289e+01 2.787e+01 3.226e+01 3.870e+01 7.991e+01, threshold=6.452e+01, percent-clipped=3.0 2024-08-10 20:59:25,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746050.0, ans=0.0 2024-08-10 20:59:31,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2150, loss[loss=0.118, beats_loss=0.0118, ecapa_loss=0.0002282, whisper_loss=0.1039, over 16272.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0117, ecapa_loss=0.0002184, whisper_loss=0.09306, over 3833434.63 frames. ], batch size: 63, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 20:59:37,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746150.0, ans=0.1 2024-08-10 20:59:49,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=746250.0, ans=0.0 2024-08-10 20:59:54,676 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 20:59:55,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=746250.0, ans=0.0 2024-08-10 21:00:21,774 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 21:00:22,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=746350.0, ans=0.0 2024-08-10 21:00:23,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=746450.0, ans=0.2 2024-08-10 21:00:35,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=746450.0, ans=0.0 2024-08-10 21:00:57,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2200, loss[loss=0.09261, beats_loss=0.01186, ecapa_loss=0.00019, whisper_loss=0.07885, over 15351.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01168, ecapa_loss=0.00022, whisper_loss=0.09337, over 3826774.36 frames. ], batch size: 61, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:01:00,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746650.0, ans=0.1 2024-08-10 21:01:12,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=746650.0, ans=0.125 2024-08-10 21:01:23,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746750.0, ans=0.1 2024-08-10 21:01:34,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=746850.0, ans=0.0 2024-08-10 21:01:34,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=746850.0, ans=0.2 2024-08-10 21:01:36,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=746850.0, ans=0.125 2024-08-10 21:01:50,582 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 21:01:59,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.667e+01 3.183e+01 3.944e+01 1.052e+02, threshold=6.365e+01, percent-clipped=1.0 2024-08-10 21:02:02,959 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 21:02:10,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.28 vs. limit=22.5 2024-08-10 21:02:23,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=747150.0, ans=0.0 2024-08-10 21:02:24,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2250, loss[loss=0.07285, beats_loss=0.01153, ecapa_loss=0.0001847, whisper_loss=0.05948, over 14459.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01167, ecapa_loss=0.0002211, whisper_loss=0.09388, over 3828503.16 frames. ], batch size: 55, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:02:25,932 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 21:02:40,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=747250.0, ans=0.07 2024-08-10 21:02:46,173 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-10 21:02:58,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747350.0, ans=0.1 2024-08-10 21:03:01,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=747350.0, ans=0.125 2024-08-10 21:03:19,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=747450.0, ans=0.125 2024-08-10 21:03:25,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2024-08-10 21:03:41,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=747550.0, ans=0.2 2024-08-10 21:03:51,116 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2300, loss[loss=0.1081, beats_loss=0.01403, ecapa_loss=0.0001864, whisper_loss=0.09221, over 17020.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01174, ecapa_loss=0.000221, whisper_loss=0.09432, over 3862166.26 frames. ], batch size: 65, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:03:53,019 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 21:03:57,457 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 21:04:06,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=747750.0, ans=0.125 2024-08-10 21:04:07,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=747750.0, ans=0.125 2024-08-10 21:04:33,521 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 21:04:53,104 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.764e+01 3.059e+01 3.552e+01 5.257e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-10 21:05:05,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748050.0, ans=0.1 2024-08-10 21:05:07,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=748050.0, ans=0.0 2024-08-10 21:05:19,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2350, loss[loss=0.1064, beats_loss=0.01455, ecapa_loss=0.0001906, whisper_loss=0.08993, over 21575.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01175, ecapa_loss=0.000222, whisper_loss=0.09426, over 3845837.69 frames. ], batch size: 84, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:05:24,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=748150.0, ans=0.0 2024-08-10 21:05:29,739 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-10 21:05:36,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-08-10 21:05:45,083 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 21:05:45,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=748250.0, ans=0.125 2024-08-10 21:06:10,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=748350.0, ans=0.09899494936611666 2024-08-10 21:06:11,832 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-10 21:06:29,297 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 21:07:01,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=748550.0, ans=0.125 2024-08-10 21:07:08,503 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2400, loss[loss=0.1206, beats_loss=0.009986, ecapa_loss=0.0002864, whisper_loss=0.1078, over 18349.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01162, ecapa_loss=0.0002224, whisper_loss=0.09451, over 3842006.38 frames. ], batch size: 74, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:07:11,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2024-08-10 21:07:14,080 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 21:07:16,598 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 33 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-10 21:07:17,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.33 vs. limit=15.0 2024-08-10 21:07:33,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=748750.0, ans=0.125 2024-08-10 21:08:47,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.712e+01 3.107e+01 3.563e+01 2.420e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-10 21:09:11,525 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 21:09:27,514 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-10 21:09:29,382 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2450, loss[loss=0.1091, beats_loss=0.01206, ecapa_loss=0.0002629, whisper_loss=0.0944, over 20427.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01168, ecapa_loss=0.0002243, whisper_loss=0.09368, over 3877892.23 frames. ], batch size: 85, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:09:29,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=749150.0, ans=0.125 2024-08-10 21:09:49,991 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 21:09:54,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=749250.0, ans=0.125 2024-08-10 21:09:54,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749250.0, ans=0.1 2024-08-10 21:10:13,889 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-10 21:10:15,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=749350.0, ans=0.2 2024-08-10 21:10:17,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=749350.0, ans=0.125 2024-08-10 21:10:39,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2024-08-10 21:10:40,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=749450.0, ans=0.2 2024-08-10 21:11:01,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2500, loss[loss=0.1043, beats_loss=0.01249, ecapa_loss=0.0001979, whisper_loss=0.08988, over 22335.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01172, ecapa_loss=0.0002263, whisper_loss=0.0936, over 3882842.29 frames. ], batch size: 88, lr: 1.08e-02, grad_scale: 4398046511104.0 2024-08-10 21:11:07,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=749650.0, ans=0.0 2024-08-10 21:11:19,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=749750.0, ans=0.0 2024-08-10 21:11:19,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=749750.0, ans=0.125 2024-08-10 21:11:31,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=749750.0, ans=0.2 2024-08-10 21:11:34,552 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 34 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-10 21:12:03,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+01 2.786e+01 3.132e+01 3.631e+01 5.389e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:12:11,985 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 21:12:26,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=750050.0, ans=0.0 2024-08-10 21:12:30,494 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 21:12:32,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2550, loss[loss=0.1135, beats_loss=0.01114, ecapa_loss=0.0002436, whisper_loss=0.09996, over 17925.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01168, ecapa_loss=0.0002244, whisper_loss=0.09369, over 3860324.64 frames. ], batch size: 69, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:12:44,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=750150.0, ans=0.125 2024-08-10 21:12:51,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-10 21:12:57,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750250.0, ans=0.1 2024-08-10 21:13:05,369 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-10 21:13:25,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=750350.0, ans=0.125 2024-08-10 21:13:33,749 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-10 21:13:49,550 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 21:13:51,412 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-10 21:14:07,932 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2600, loss[loss=0.1114, beats_loss=0.01033, ecapa_loss=0.0002112, whisper_loss=0.09898, over 16186.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01162, ecapa_loss=0.000224, whisper_loss=0.09436, over 3856373.30 frames. ], batch size: 63, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:14:24,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=750750.0, ans=0.125 2024-08-10 21:14:35,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-08-10 21:14:36,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=750750.0, ans=0.0 2024-08-10 21:14:39,489 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-10 21:14:42,349 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 21:15:09,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.826e+01 3.235e+01 3.900e+01 8.164e+01, threshold=6.470e+01, percent-clipped=1.0 2024-08-10 21:15:13,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=750950.0, ans=0.125 2024-08-10 21:15:31,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=751050.0, ans=0.125 2024-08-10 21:15:33,789 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2650, loss[loss=0.1291, beats_loss=0.009056, ecapa_loss=0.0002545, whisper_loss=0.1175, over 16439.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01159, ecapa_loss=0.0002246, whisper_loss=0.09389, over 3857996.14 frames. ], batch size: 64, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:15:37,016 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-10 21:15:44,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=751150.0, ans=0.2 2024-08-10 21:15:55,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=751250.0, ans=0.5 2024-08-10 21:15:58,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=751250.0, ans=0.1 2024-08-10 21:16:00,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=751250.0, ans=0.125 2024-08-10 21:16:05,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=12.0 2024-08-10 21:16:12,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=751350.0, ans=0.125 2024-08-10 21:16:14,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=751350.0, ans=0.2 2024-08-10 21:16:19,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=751350.0, ans=0.0 2024-08-10 21:16:27,364 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 21:16:39,106 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-10 21:16:57,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=751550.0, ans=0.125 2024-08-10 21:17:02,392 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2700, loss[loss=0.09888, beats_loss=0.01173, ecapa_loss=0.0001966, whisper_loss=0.08518, over 18268.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01172, ecapa_loss=0.0002254, whisper_loss=0.09318, over 3892050.25 frames. ], batch size: 70, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:17:02,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=751650.0, ans=0.125 2024-08-10 21:17:12,955 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 21:17:28,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751750.0, ans=0.0 2024-08-10 21:17:51,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751850.0, ans=0.1 2024-08-10 21:17:58,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.58 vs. limit=22.5 2024-08-10 21:18:03,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.378e+01 3.016e+01 3.341e+01 3.971e+01 1.144e+02, threshold=6.682e+01, percent-clipped=3.0 2024-08-10 21:18:18,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=752050.0, ans=0.125 2024-08-10 21:18:23,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=22.5 2024-08-10 21:18:28,260 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2750, loss[loss=0.1274, beats_loss=0.009128, ecapa_loss=0.0002296, whisper_loss=0.116, over 17697.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01175, ecapa_loss=0.0002252, whisper_loss=0.0928, over 3856748.29 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:18:40,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=752150.0, ans=0.125 2024-08-10 21:18:45,023 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 38 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-10 21:18:51,987 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 21:18:54,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=752250.0, ans=0.2 2024-08-10 21:19:24,977 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-10 21:19:54,853 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2800, loss[loss=0.08571, beats_loss=0.01035, ecapa_loss=0.0002118, whisper_loss=0.07324, over 18709.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01173, ecapa_loss=0.0002247, whisper_loss=0.09334, over 3873112.97 frames. ], batch size: 73, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:20:11,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=752750.0, ans=0.125 2024-08-10 21:20:17,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=752750.0, ans=0.0 2024-08-10 21:20:31,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=752850.0, ans=0.125 2024-08-10 21:20:46,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=752950.0, ans=0.95 2024-08-10 21:20:53,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.721e+01 3.078e+01 3.353e+01 6.515e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-10 21:21:05,556 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 21:21:12,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753050.0, ans=0.1 2024-08-10 21:21:17,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=753050.0, ans=0.0 2024-08-10 21:21:20,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2850, loss[loss=0.1215, beats_loss=0.009056, ecapa_loss=0.0002905, whisper_loss=0.1096, over 15847.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01174, ecapa_loss=0.0002233, whisper_loss=0.09331, over 3835372.22 frames. ], batch size: 62, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:22:05,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=753350.0, ans=0.125 2024-08-10 21:22:32,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=753450.0, ans=0.125 2024-08-10 21:22:45,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=753550.0, ans=0.05 2024-08-10 21:22:53,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2900, loss[loss=0.1076, beats_loss=0.009868, ecapa_loss=0.00023, whisper_loss=0.09546, over 18233.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01172, ecapa_loss=0.0002255, whisper_loss=0.09342, over 3829963.66 frames. ], batch size: 72, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:23:00,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=753650.0, ans=0.2 2024-08-10 21:23:21,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=753750.0, ans=0.95 2024-08-10 21:23:27,174 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-10 21:23:36,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.50 vs. limit=10.0 2024-08-10 21:23:48,130 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 30 from Vox, 21 fro AS 2024-08-10 21:23:53,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.758e+01 3.070e+01 3.678e+01 5.521e+01, threshold=6.141e+01, percent-clipped=0.0 2024-08-10 21:23:53,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753950.0, ans=0.1 2024-08-10 21:24:09,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=12.0 2024-08-10 21:24:17,328 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-10 21:24:18,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 2950, loss[loss=0.123, beats_loss=0.01001, ecapa_loss=0.0002777, whisper_loss=0.1102, over 14459.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01173, ecapa_loss=0.0002268, whisper_loss=0.09429, over 3859028.85 frames. ], batch size: 59, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:24:59,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=754350.0, ans=10.0 2024-08-10 21:25:00,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=754350.0, ans=0.0 2024-08-10 21:25:39,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3000, loss[loss=0.09553, beats_loss=0.01153, ecapa_loss=0.0002851, whisper_loss=0.08115, over 15793.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01169, ecapa_loss=0.0002268, whisper_loss=0.09416, over 3867274.08 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:25:39,065 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 21:26:18,875 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0007066, whisper_loss=0.2527, over 922467.00 frames. 2024-08-10 21:26:38,610 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on SV_voxceleb1: loss=0.005938, beats_loss=0, ecapa_loss=0.0005938, whisper_loss=0, over 939242.00 frames. 2024-08-10 21:28:42,303 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on AT_audioset: loss=0.02614, beats_loss=0.02614, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 21:28:42,308 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 21:28:43,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=754650.0, ans=0.0 2024-08-10 21:29:08,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=754750.0, ans=0.0 2024-08-10 21:29:11,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=754850.0, ans=0.125 2024-08-10 21:29:23,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=754850.0, ans=0.0 2024-08-10 21:29:24,974 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 21:29:38,563 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.874e+01 3.287e+01 3.873e+01 6.300e+01, threshold=6.573e+01, percent-clipped=1.0 2024-08-10 21:30:02,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3050, loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0002071, whisper_loss=0.09057, over 18587.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01169, ecapa_loss=0.0002261, whisper_loss=0.09475, over 3910933.88 frames. ], batch size: 71, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:30:18,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-08-10 21:30:30,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=755250.0, ans=0.125 2024-08-10 21:30:32,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-08-10 21:30:38,931 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-10 21:30:50,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.68 vs. limit=22.5 2024-08-10 21:31:16,899 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-10 21:31:22,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3100, loss[loss=0.1124, beats_loss=0.01128, ecapa_loss=0.0002253, whisper_loss=0.09883, over 16863.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01167, ecapa_loss=0.0002264, whisper_loss=0.09466, over 3881569.23 frames. ], batch size: 68, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:31:23,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=755650.0, ans=0.1 2024-08-10 21:31:28,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=755650.0, ans=0.125 2024-08-10 21:31:46,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=755750.0, ans=0.125 2024-08-10 21:31:55,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=755850.0, ans=0.0 2024-08-10 21:32:06,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=755850.0, ans=0.125 2024-08-10 21:32:06,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755850.0, ans=0.1 2024-08-10 21:32:13,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=755950.0, ans=0.125 2024-08-10 21:32:18,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=755950.0, ans=0.125 2024-08-10 21:32:19,725 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-10 21:32:21,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.626e+01 2.939e+01 3.498e+01 4.571e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-10 21:32:25,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755950.0, ans=0.1 2024-08-10 21:32:26,562 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 21:32:44,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3150, loss[loss=0.116, beats_loss=0.008011, ecapa_loss=0.0002159, whisper_loss=0.1058, over 16880.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01168, ecapa_loss=0.0002261, whisper_loss=0.09413, over 3849590.71 frames. ], batch size: 64, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:32:56,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2024-08-10 21:32:58,801 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-10 21:33:00,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756250.0, ans=0.125 2024-08-10 21:33:15,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=756350.0, ans=0.2 2024-08-10 21:33:20,045 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-10 21:33:34,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.27 vs. limit=22.5 2024-08-10 21:33:40,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2024-08-10 21:34:02,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=756550.0, ans=0.035 2024-08-10 21:34:02,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756550.0, ans=0.1 2024-08-10 21:34:04,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3200, loss[loss=0.1218, beats_loss=0.008636, ecapa_loss=0.0002609, whisper_loss=0.1105, over 15355.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01166, ecapa_loss=0.0002261, whisper_loss=0.09484, over 3853087.79 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:34:09,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=756650.0, ans=0.125 2024-08-10 21:34:11,755 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 21:34:12,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=756650.0, ans=0.0 2024-08-10 21:34:15,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-10 21:34:17,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-08-10 21:34:21,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=756750.0, ans=0.0 2024-08-10 21:34:33,114 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-10 21:34:39,937 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 21:34:44,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=756850.0, ans=0.125 2024-08-10 21:34:53,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756950.0, ans=0.125 2024-08-10 21:34:56,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756950.0, ans=0.1 2024-08-10 21:35:01,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=756950.0, ans=0.0 2024-08-10 21:35:03,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.765e+01 3.113e+01 3.844e+01 7.476e+01, threshold=6.225e+01, percent-clipped=4.0 2024-08-10 21:35:12,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.53 vs. limit=22.5 2024-08-10 21:35:15,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=757050.0, ans=0.0 2024-08-10 21:35:16,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757050.0, ans=0.0 2024-08-10 21:35:26,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3250, loss[loss=0.1011, beats_loss=0.01261, ecapa_loss=0.0002268, whisper_loss=0.08622, over 21888.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01163, ecapa_loss=0.0002272, whisper_loss=0.09476, over 3848689.34 frames. ], batch size: 89, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:35:27,082 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-10 21:35:47,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=757250.0, ans=0.2 2024-08-10 21:35:48,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-10 21:36:45,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=757550.0, ans=0.1 2024-08-10 21:36:50,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3300, loss[loss=0.119, beats_loss=0.01204, ecapa_loss=0.0002509, whisper_loss=0.1044, over 22192.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01174, ecapa_loss=0.0002263, whisper_loss=0.09462, over 3895711.94 frames. ], batch size: 91, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:37:19,456 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-10 21:37:19,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=757750.0, ans=0.0 2024-08-10 21:37:35,656 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 21:37:40,469 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 21:37:42,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=757950.0, ans=0.125 2024-08-10 21:37:42,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-10 21:37:45,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757950.0, ans=0.125 2024-08-10 21:37:49,285 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 21:37:50,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.783e+01 3.080e+01 3.590e+01 5.176e+01, threshold=6.160e+01, percent-clipped=0.0 2024-08-10 21:37:54,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=12.0 2024-08-10 21:38:08,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=758050.0, ans=0.125 2024-08-10 21:38:14,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=758150.0, ans=0.025 2024-08-10 21:38:15,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3350, loss[loss=0.1005, beats_loss=0.01468, ecapa_loss=0.0001934, whisper_loss=0.08391, over 21818.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01165, ecapa_loss=0.0002279, whisper_loss=0.09515, over 3869535.56 frames. ], batch size: 88, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:38:16,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=758150.0, ans=0.0 2024-08-10 21:38:17,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-10 21:38:19,789 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 21:38:35,809 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 21:38:39,087 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-10 21:38:41,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2024-08-10 21:38:58,621 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 21:39:02,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=758450.0, ans=0.0 2024-08-10 21:39:08,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758450.0, ans=0.1 2024-08-10 21:39:09,676 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-10 21:39:09,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=758450.0, ans=0.125 2024-08-10 21:39:25,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=12.0 2024-08-10 21:39:33,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3400, loss[loss=0.1158, beats_loss=0.01189, ecapa_loss=0.0002067, whisper_loss=0.1019, over 20919.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01164, ecapa_loss=0.0002277, whisper_loss=0.09532, over 3852632.13 frames. ], batch size: 82, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:39:38,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=758650.0, ans=0.2 2024-08-10 21:39:38,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=758650.0, ans=0.125 2024-08-10 21:39:56,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=758750.0, ans=0.2 2024-08-10 21:40:13,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=758850.0, ans=0.04949747468305833 2024-08-10 21:40:15,917 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 21:40:32,011 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.745e+01 3.132e+01 3.636e+01 5.691e+01, threshold=6.264e+01, percent-clipped=0.0 2024-08-10 21:40:35,528 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 21:40:45,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=759050.0, ans=0.125 2024-08-10 21:40:47,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=759050.0, ans=0.0 2024-08-10 21:40:50,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=759050.0, ans=0.125 2024-08-10 21:40:52,633 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 21:40:56,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3450, loss[loss=0.1067, beats_loss=0.01117, ecapa_loss=0.0002282, whisper_loss=0.09323, over 23259.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01161, ecapa_loss=0.0002279, whisper_loss=0.09455, over 3824337.50 frames. ], batch size: 92, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:41:06,006 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-10 21:41:09,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=759150.0, ans=22.5 2024-08-10 21:41:09,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-08-10 21:41:13,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=759250.0, ans=0.125 2024-08-10 21:41:15,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-08-10 21:41:36,506 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-10 21:41:46,548 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-10 21:41:47,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-10 21:41:48,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=759450.0, ans=0.125 2024-08-10 21:41:48,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=759450.0, ans=0.125 2024-08-10 21:41:50,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=759450.0, ans=0.2 2024-08-10 21:42:06,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=759550.0, ans=0.125 2024-08-10 21:42:07,755 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 21:42:19,387 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3500, loss[loss=0.09538, beats_loss=0.01287, ecapa_loss=0.000175, whisper_loss=0.08076, over 15670.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01154, ecapa_loss=0.0002297, whisper_loss=0.0949, over 3850093.69 frames. ], batch size: 61, lr: 1.07e-02, grad_scale: 4398046511104.0 2024-08-10 21:42:21,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=759650.0, ans=0.0 2024-08-10 21:42:26,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=759650.0, ans=22.5 2024-08-10 21:42:31,046 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 17 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-10 21:42:37,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=759750.0, ans=0.125 2024-08-10 21:43:07,752 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-76000.pt 2024-08-10 21:43:11,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.666e+01 2.958e+01 3.304e+01 6.870e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-10 21:43:13,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=759950.0, ans=0.2 2024-08-10 21:43:31,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3550, loss[loss=0.1072, beats_loss=0.01125, ecapa_loss=0.0002231, whisper_loss=0.09372, over 19990.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01162, ecapa_loss=0.0002281, whisper_loss=0.09498, over 3897017.03 frames. ], batch size: 80, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:44:06,364 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:44:14,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=760450.0, ans=0.125 2024-08-10 21:44:23,038 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-10 21:44:25,468 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-10 21:44:32,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2024-08-10 21:44:34,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=760550.0, ans=0.0 2024-08-10 21:44:36,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3600, loss[loss=0.1012, beats_loss=0.01655, ecapa_loss=0.0002019, whisper_loss=0.0826, over 21301.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01158, ecapa_loss=0.0002269, whisper_loss=0.0954, over 3911736.23 frames. ], batch size: 89, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:44:38,655 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 18 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-10 21:44:54,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=760750.0, ans=0.0 2024-08-10 21:45:03,794 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-10 21:45:09,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=760850.0, ans=0.0 2024-08-10 21:45:22,561 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 21:45:23,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.660e+01 3.011e+01 3.359e+01 4.667e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-10 21:45:24,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=760950.0, ans=0.125 2024-08-10 21:45:28,968 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 21:45:43,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3650, loss[loss=0.1049, beats_loss=0.01326, ecapa_loss=0.0001909, whisper_loss=0.08974, over 20346.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01165, ecapa_loss=0.0002255, whisper_loss=0.09469, over 3869040.09 frames. ], batch size: 81, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:45:43,776 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 21:46:07,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=761250.0, ans=0.125 2024-08-10 21:46:19,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761350.0, ans=0.1 2024-08-10 21:46:22,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-10 21:46:48,750 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3700, loss[loss=0.1196, beats_loss=0.0124, ecapa_loss=0.0002162, whisper_loss=0.105, over 24604.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01171, ecapa_loss=0.0002243, whisper_loss=0.09464, over 3832096.23 frames. ], batch size: 97, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:46:52,855 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 21:46:54,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2024-08-10 21:46:58,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=12.0 2024-08-10 21:46:59,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=761650.0, ans=0.125 2024-08-10 21:47:04,859 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 33 from Vox, 32 fro AS 2024-08-10 21:47:10,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=761750.0, ans=0.2 2024-08-10 21:47:19,300 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-10 21:47:26,065 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-10 21:47:26,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=761850.0, ans=0.125 2024-08-10 21:47:30,095 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-10 21:47:35,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.698e+01 3.015e+01 3.307e+01 5.689e+01, threshold=6.030e+01, percent-clipped=0.0 2024-08-10 21:47:55,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3750, loss[loss=0.08048, beats_loss=0.01441, ecapa_loss=0.0001868, whisper_loss=0.0642, over 16638.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01165, ecapa_loss=0.0002255, whisper_loss=0.09503, over 3859870.44 frames. ], batch size: 66, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:47:55,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-10 21:48:02,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=762150.0, ans=0.2 2024-08-10 21:48:05,012 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 21:48:06,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=762150.0, ans=0.0 2024-08-10 21:48:11,599 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-10 21:48:18,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2024-08-10 21:48:26,136 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-10 21:48:28,727 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 21:48:28,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=762350.0, ans=0.0 2024-08-10 21:48:30,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=762350.0, ans=0.025 2024-08-10 21:48:48,225 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 36 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 21:48:48,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=762550.0, ans=0.0 2024-08-10 21:49:00,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=762650.0, ans=0.125 2024-08-10 21:49:01,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3800, loss[loss=0.1233, beats_loss=0.009671, ecapa_loss=0.0002439, whisper_loss=0.1112, over 17455.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01167, ecapa_loss=0.0002253, whisper_loss=0.09549, over 3889954.10 frames. ], batch size: 69, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:49:02,773 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 21:49:03,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=762650.0, ans=0.0 2024-08-10 21:49:18,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=762750.0, ans=0.0 2024-08-10 21:49:30,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=762850.0, ans=0.125 2024-08-10 21:49:31,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=762850.0, ans=0.0 2024-08-10 21:49:39,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-10 21:49:42,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-08-10 21:49:47,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.822e+01 3.123e+01 3.627e+01 5.849e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-10 21:49:50,208 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-10 21:50:06,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763150.0, ans=0.1 2024-08-10 21:50:07,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3850, loss[loss=0.1126, beats_loss=0.01098, ecapa_loss=0.0002043, whisper_loss=0.09962, over 22906.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01166, ecapa_loss=0.0002252, whisper_loss=0.09544, over 3904921.88 frames. ], batch size: 90, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:50:15,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763150.0, ans=0.1 2024-08-10 21:50:23,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=763250.0, ans=0.2 2024-08-10 21:50:34,866 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-10 21:50:35,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=763350.0, ans=0.125 2024-08-10 21:50:38,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=763350.0, ans=0.125 2024-08-10 21:50:40,417 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-10 21:50:42,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=763350.0, ans=0.125 2024-08-10 21:50:43,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=763350.0, ans=0.125 2024-08-10 21:50:47,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=763450.0, ans=0.1 2024-08-10 21:50:47,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=763450.0, ans=0.0 2024-08-10 21:50:48,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=763450.0, ans=0.125 2024-08-10 21:51:01,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=763550.0, ans=0.0 2024-08-10 21:51:10,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763550.0, ans=0.1 2024-08-10 21:51:12,971 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3900, loss[loss=0.1304, beats_loss=0.007899, ecapa_loss=0.0002001, whisper_loss=0.1205, over 23742.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01162, ecapa_loss=0.0002242, whisper_loss=0.09568, over 3891207.45 frames. ], batch size: 84, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:51:20,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-08-10 21:51:31,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=763750.0, ans=15.0 2024-08-10 21:51:35,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=763750.0, ans=0.5 2024-08-10 21:51:49,537 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 21:51:53,475 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 21:51:57,943 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 21:51:58,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.868e+01 3.190e+01 3.521e+01 6.195e+01, threshold=6.380e+01, percent-clipped=0.0 2024-08-10 21:52:01,702 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-10 21:52:05,592 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 21:52:10,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=764050.0, ans=0.125 2024-08-10 21:52:12,958 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-10 21:52:14,112 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 21:52:17,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 3950, loss[loss=0.1199, beats_loss=0.009596, ecapa_loss=0.0002553, whisper_loss=0.1078, over 21749.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01162, ecapa_loss=0.0002254, whisper_loss=0.09553, over 3917971.93 frames. ], batch size: 91, lr: 1.07e-02, grad_scale: 8796093022208.0 2024-08-10 21:52:18,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=764150.0, ans=0.125 2024-08-10 21:52:25,871 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-10 21:52:31,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=764250.0, ans=0.2 2024-08-10 21:52:38,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=12.0 2024-08-10 21:52:42,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764250.0, ans=0.1 2024-08-10 21:52:52,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-08-10 21:52:53,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=764350.0, ans=0.2 2024-08-10 21:52:53,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=764350.0, ans=0.125 2024-08-10 21:53:20,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=764550.0, ans=0.0 2024-08-10 21:53:21,743 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.973e-02 2024-08-10 21:53:23,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=764650.0, ans=0.125 2024-08-10 21:53:23,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=764650.0, ans=15.0 2024-08-10 21:53:24,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4000, loss[loss=0.1081, beats_loss=0.009671, ecapa_loss=0.0002964, whisper_loss=0.09547, over 19051.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01158, ecapa_loss=0.0002264, whisper_loss=0.09561, over 3924091.08 frames. ], batch size: 77, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:53:27,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=764650.0, ans=0.0 2024-08-10 21:53:34,290 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-10 21:53:36,970 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-10 21:53:41,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-10 21:53:48,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2024-08-10 21:53:51,216 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-10 21:53:57,756 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-10 21:54:02,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=764950.0, ans=0.0 2024-08-10 21:54:09,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.764e+01 3.105e+01 3.573e+01 5.750e+01, threshold=6.210e+01, percent-clipped=0.0 2024-08-10 21:54:12,328 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-10 21:54:13,598 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-10 21:54:16,222 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 21:54:29,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4050, loss[loss=0.09903, beats_loss=0.01317, ecapa_loss=0.0002636, whisper_loss=0.08322, over 20515.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01162, ecapa_loss=0.0002261, whisper_loss=0.09456, over 3907973.37 frames. ], batch size: 89, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:54:31,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=765150.0, ans=0.125 2024-08-10 21:54:31,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=765150.0, ans=0.125 2024-08-10 21:54:41,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=765250.0, ans=0.1 2024-08-10 21:54:41,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=765250.0, ans=0.0 2024-08-10 21:54:46,262 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 21:54:57,925 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 21:55:11,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=765450.0, ans=0.125 2024-08-10 21:55:15,070 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 21:55:28,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=765550.0, ans=0.125 2024-08-10 21:55:31,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=765550.0, ans=0.125 2024-08-10 21:55:31,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.49 vs. limit=6.0 2024-08-10 21:55:34,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4100, loss[loss=0.1106, beats_loss=0.01021, ecapa_loss=0.0002375, whisper_loss=0.09803, over 20803.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01165, ecapa_loss=0.0002253, whisper_loss=0.09438, over 3885975.12 frames. ], batch size: 85, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:55:40,022 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 21:55:52,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=765750.0, ans=0.0 2024-08-10 21:56:06,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=765850.0, ans=0.125 2024-08-10 21:56:11,387 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 21:56:20,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.758e+01 3.048e+01 3.457e+01 5.910e+01, threshold=6.096e+01, percent-clipped=0.0 2024-08-10 21:56:30,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=766050.0, ans=15.0 2024-08-10 21:56:37,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=12.0 2024-08-10 21:56:40,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4150, loss[loss=0.08955, beats_loss=0.01259, ecapa_loss=0.0002176, whisper_loss=0.07478, over 15252.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0117, ecapa_loss=0.0002231, whisper_loss=0.09448, over 3868278.44 frames. ], batch size: 65, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:56:52,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=766250.0, ans=0.025 2024-08-10 21:56:59,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=766250.0, ans=10.0 2024-08-10 21:57:00,552 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 21:57:21,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=766450.0, ans=0.0 2024-08-10 21:57:31,351 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 21:57:36,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=766550.0, ans=0.125 2024-08-10 21:57:39,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=766550.0, ans=0.125 2024-08-10 21:57:46,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4200, loss[loss=0.1032, beats_loss=0.01102, ecapa_loss=0.000226, whisper_loss=0.08995, over 18483.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01166, ecapa_loss=0.0002232, whisper_loss=0.09393, over 3894014.57 frames. ], batch size: 71, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:57:52,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.38 vs. limit=22.5 2024-08-10 21:57:53,966 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-10 21:57:54,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-10 21:57:56,785 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 21:58:04,834 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-10 21:58:26,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2024-08-10 21:58:29,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=766950.0, ans=0.0 2024-08-10 21:58:31,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.720e+01 3.062e+01 3.636e+01 5.115e+01, threshold=6.123e+01, percent-clipped=0.0 2024-08-10 21:58:41,189 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-10 21:58:51,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4250, loss[loss=0.1041, beats_loss=0.01143, ecapa_loss=0.0002216, whisper_loss=0.09043, over 16029.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01168, ecapa_loss=0.0002234, whisper_loss=0.09361, over 3892920.60 frames. ], batch size: 62, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 21:58:51,525 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-10 21:58:53,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767150.0, ans=0.125 2024-08-10 21:59:09,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767250.0, ans=0.1 2024-08-10 21:59:19,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=767350.0, ans=0.125 2024-08-10 21:59:23,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=767350.0, ans=0.125 2024-08-10 21:59:31,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=767450.0, ans=10.0 2024-08-10 21:59:43,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=767550.0, ans=10.0 2024-08-10 21:59:57,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4300, loss[loss=0.0889, beats_loss=0.01395, ecapa_loss=0.0002047, whisper_loss=0.0729, over 17426.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0116, ecapa_loss=0.0002241, whisper_loss=0.09354, over 3867368.49 frames. ], batch size: 71, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:00:11,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=767750.0, ans=0.125 2024-08-10 22:00:13,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=767750.0, ans=0.125 2024-08-10 22:00:20,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=767750.0, ans=0.0 2024-08-10 22:00:26,771 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-10 22:00:43,527 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.655e+01 2.968e+01 3.386e+01 7.323e+01, threshold=5.937e+01, percent-clipped=2.0 2024-08-10 22:00:57,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=768050.0, ans=0.125 2024-08-10 22:01:03,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4350, loss[loss=0.1075, beats_loss=0.01066, ecapa_loss=0.0002165, whisper_loss=0.09473, over 16843.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01161, ecapa_loss=0.0002245, whisper_loss=0.09281, over 3836252.83 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:01:08,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=768150.0, ans=0.0 2024-08-10 22:01:15,828 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-10 22:01:16,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=768250.0, ans=0.125 2024-08-10 22:02:08,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4400, loss[loss=0.09445, beats_loss=0.01182, ecapa_loss=0.0002096, whisper_loss=0.08053, over 14010.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01165, ecapa_loss=0.0002231, whisper_loss=0.09377, over 3869013.51 frames. ], batch size: 54, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:02:12,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.92 vs. limit=10.0 2024-08-10 22:02:16,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=768650.0, ans=0.125 2024-08-10 22:02:27,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=768750.0, ans=0.125 2024-08-10 22:02:28,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=768750.0, ans=0.04949747468305833 2024-08-10 22:02:34,976 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 22:02:39,336 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-10 22:02:43,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=768850.0, ans=0.125 2024-08-10 22:02:45,998 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 9 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-10 22:02:47,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=12.0 2024-08-10 22:02:55,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+01 2.875e+01 3.279e+01 3.849e+01 6.433e+01, threshold=6.559e+01, percent-clipped=3.0 2024-08-10 22:03:10,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=769050.0, ans=0.09899494936611666 2024-08-10 22:03:11,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=769050.0, ans=0.125 2024-08-10 22:03:14,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4450, loss[loss=0.1029, beats_loss=0.01183, ecapa_loss=0.0002348, whisper_loss=0.08872, over 22200.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01171, ecapa_loss=0.000222, whisper_loss=0.09373, over 3894462.01 frames. ], batch size: 94, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:03:16,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=769150.0, ans=0.125 2024-08-10 22:03:20,365 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-10 22:03:24,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2024-08-10 22:03:38,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=769250.0, ans=0.2 2024-08-10 22:03:55,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=769450.0, ans=0.0 2024-08-10 22:04:04,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=769450.0, ans=0.125 2024-08-10 22:04:17,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=769550.0, ans=0.0 2024-08-10 22:04:20,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4500, loss[loss=0.1055, beats_loss=0.01039, ecapa_loss=0.0002516, whisper_loss=0.09264, over 18780.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01167, ecapa_loss=0.0002221, whisper_loss=0.0941, over 3916688.53 frames. ], batch size: 79, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:04:22,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=769650.0, ans=0.07 2024-08-10 22:04:26,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=769650.0, ans=0.1 2024-08-10 22:04:51,283 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-10 22:05:02,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=769950.0, ans=0.0 2024-08-10 22:05:05,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.643e+01 3.102e+01 3.659e+01 7.014e+01, threshold=6.204e+01, percent-clipped=1.0 2024-08-10 22:05:25,019 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4550, loss[loss=0.1349, beats_loss=0.01082, ecapa_loss=0.0002394, whisper_loss=0.1217, over 18159.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01163, ecapa_loss=0.000223, whisper_loss=0.09456, over 3926569.50 frames. ], batch size: 74, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:05:25,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=770150.0, ans=0.0 2024-08-10 22:05:39,903 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 22:05:42,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=770250.0, ans=0.2 2024-08-10 22:05:44,814 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-10 22:05:55,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770350.0, ans=0.125 2024-08-10 22:06:02,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.70 vs. limit=22.5 2024-08-10 22:06:03,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-10 22:06:10,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=770450.0, ans=0.125 2024-08-10 22:06:11,871 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 22:06:20,069 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 22:06:30,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4600, loss[loss=0.1068, beats_loss=0.01367, ecapa_loss=0.0002133, whisper_loss=0.09103, over 21392.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.0117, ecapa_loss=0.0002217, whisper_loss=0.09445, over 3934356.35 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:06:31,725 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-10 22:06:33,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=770650.0, ans=0.125 2024-08-10 22:06:38,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770650.0, ans=0.1 2024-08-10 22:06:38,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770650.0, ans=0.1 2024-08-10 22:06:39,994 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-10 22:07:01,258 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-10 22:07:14,141 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-10 22:07:16,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.935e+01 3.290e+01 3.824e+01 6.429e+01, threshold=6.581e+01, percent-clipped=1.0 2024-08-10 22:07:36,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4650, loss[loss=0.128, beats_loss=0.01056, ecapa_loss=0.0002379, whisper_loss=0.1151, over 19767.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0118, ecapa_loss=0.000223, whisper_loss=0.09355, over 3932614.85 frames. ], batch size: 79, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:07:39,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=771150.0, ans=0.05 2024-08-10 22:07:47,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=771150.0, ans=0.125 2024-08-10 22:07:48,849 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 22:08:17,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771450.0, ans=0.1 2024-08-10 22:08:27,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=771450.0, ans=0.07 2024-08-10 22:08:28,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=771550.0, ans=0.125 2024-08-10 22:08:33,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=771550.0, ans=0.125 2024-08-10 22:08:43,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4700, loss[loss=0.1094, beats_loss=0.01192, ecapa_loss=0.0002425, whisper_loss=0.09502, over 20304.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01177, ecapa_loss=0.0002225, whisper_loss=0.09413, over 3915555.16 frames. ], batch size: 85, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:08:43,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=771650.0, ans=0.125 2024-08-10 22:08:53,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=771650.0, ans=0.2 2024-08-10 22:08:56,554 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 22:09:07,446 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-10 22:09:09,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771850.0, ans=0.1 2024-08-10 22:09:19,562 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 22:09:26,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=771950.0, ans=0.2 2024-08-10 22:09:29,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.713e+01 3.049e+01 3.532e+01 5.514e+01, threshold=6.097e+01, percent-clipped=0.0 2024-08-10 22:09:49,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4750, loss[loss=0.1284, beats_loss=0.01102, ecapa_loss=0.000219, whisper_loss=0.1152, over 22455.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01171, ecapa_loss=0.0002225, whisper_loss=0.09478, over 3921325.04 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:09:53,339 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 22:09:59,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=772150.0, ans=0.09899494936611666 2024-08-10 22:10:00,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=772150.0, ans=0.125 2024-08-10 22:10:02,889 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-10 22:10:03,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=772250.0, ans=0.125 2024-08-10 22:10:03,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=772250.0, ans=0.125 2024-08-10 22:10:09,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=772250.0, ans=0.0 2024-08-10 22:10:22,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=772350.0, ans=0.0 2024-08-10 22:10:25,742 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-10 22:10:30,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=772450.0, ans=0.0 2024-08-10 22:10:40,980 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 22:10:53,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=772550.0, ans=0.0 2024-08-10 22:10:55,387 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4800, loss[loss=0.107, beats_loss=0.01172, ecapa_loss=0.0002469, whisper_loss=0.09278, over 17058.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01179, ecapa_loss=0.0002222, whisper_loss=0.09436, over 3915677.33 frames. ], batch size: 72, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:11:00,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=772650.0, ans=0.0 2024-08-10 22:11:03,602 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-10 22:11:16,672 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-10 22:11:17,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=772750.0, ans=0.125 2024-08-10 22:11:21,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=772850.0, ans=0.2 2024-08-10 22:11:41,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.673e+01 3.089e+01 3.492e+01 5.456e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-10 22:11:43,029 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-10 22:11:44,122 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-10 22:12:00,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4850, loss[loss=0.08776, beats_loss=0.01407, ecapa_loss=0.0002008, whisper_loss=0.07169, over 19654.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01178, ecapa_loss=0.0002218, whisper_loss=0.09349, over 3881289.19 frames. ], batch size: 78, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:12:14,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-10 22:12:19,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=773250.0, ans=0.125 2024-08-10 22:12:30,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=10.0 2024-08-10 22:12:34,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=773350.0, ans=0.125 2024-08-10 22:12:38,129 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-10 22:12:39,384 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 22:12:40,780 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-10 22:12:53,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=773550.0, ans=0.2 2024-08-10 22:13:06,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4900, loss[loss=0.1011, beats_loss=0.01249, ecapa_loss=0.0002082, whisper_loss=0.08656, over 19811.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01174, ecapa_loss=0.0002227, whisper_loss=0.09425, over 3883821.01 frames. ], batch size: 78, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:13:24,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=773750.0, ans=0.0 2024-08-10 22:13:29,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=773750.0, ans=0.0 2024-08-10 22:13:40,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=773850.0, ans=0.0 2024-08-10 22:13:53,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+01 2.882e+01 3.230e+01 4.059e+01 7.454e+01, threshold=6.460e+01, percent-clipped=3.0 2024-08-10 22:13:53,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=773950.0, ans=0.125 2024-08-10 22:13:57,361 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-10 22:14:06,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2024-08-10 22:14:12,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 4950, loss[loss=0.09965, beats_loss=0.00951, ecapa_loss=0.0002528, whisper_loss=0.08762, over 13867.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01169, ecapa_loss=0.0002239, whisper_loss=0.09457, over 3874350.83 frames. ], batch size: 58, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:14:22,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=15.0 2024-08-10 22:14:23,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=774150.0, ans=0.2 2024-08-10 22:14:35,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=774250.0, ans=0.07 2024-08-10 22:14:37,870 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:14:43,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=774350.0, ans=0.125 2024-08-10 22:14:48,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774350.0, ans=0.1 2024-08-10 22:14:52,132 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 22:15:02,286 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-10 22:15:02,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=774450.0, ans=10.0 2024-08-10 22:15:09,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2024-08-10 22:15:10,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774550.0, ans=0.1 2024-08-10 22:15:18,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5000, loss[loss=0.0935, beats_loss=0.01404, ecapa_loss=0.0002522, whisper_loss=0.07694, over 17038.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01176, ecapa_loss=0.0002217, whisper_loss=0.09406, over 3899634.61 frames. ], batch size: 74, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:15:19,926 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 22:15:20,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=774650.0, ans=0.2 2024-08-10 22:15:24,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-10 22:15:38,232 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 22:15:42,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=774750.0, ans=0.0 2024-08-10 22:15:57,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=774850.0, ans=0.05 2024-08-10 22:16:07,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.262e+01 2.646e+01 2.904e+01 3.171e+01 4.689e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-10 22:16:22,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=775050.0, ans=0.125 2024-08-10 22:16:26,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775050.0, ans=0.1 2024-08-10 22:16:30,406 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5050, loss[loss=0.09125, beats_loss=0.01466, ecapa_loss=0.000256, whisper_loss=0.07404, over 17687.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01173, ecapa_loss=0.0002219, whisper_loss=0.09496, over 3902514.53 frames. ], batch size: 74, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:16:35,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=775150.0, ans=0.05 2024-08-10 22:16:56,390 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:17:06,176 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-10 22:17:15,714 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 22:17:44,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=775550.0, ans=0.0 2024-08-10 22:17:47,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5100, loss[loss=0.1142, beats_loss=0.01159, ecapa_loss=0.0002094, whisper_loss=0.1005, over 21409.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01171, ecapa_loss=0.0002224, whisper_loss=0.09512, over 3896061.25 frames. ], batch size: 86, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:17:48,981 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-10 22:18:14,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=775750.0, ans=22.5 2024-08-10 22:18:38,621 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 22:18:41,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=775950.0, ans=0.2 2024-08-10 22:18:47,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=775950.0, ans=0.125 2024-08-10 22:18:51,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.763e+01 3.180e+01 3.560e+01 6.035e+01, threshold=6.359e+01, percent-clipped=1.0 2024-08-10 22:19:02,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=776050.0, ans=0.05 2024-08-10 22:19:04,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=776050.0, ans=0.0 2024-08-10 22:19:06,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776050.0, ans=0.1 2024-08-10 22:19:08,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-10 22:19:16,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5150, loss[loss=0.1237, beats_loss=0.01194, ecapa_loss=0.0002311, whisper_loss=0.1094, over 13997.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01173, ecapa_loss=0.0002223, whisper_loss=0.09428, over 3885211.51 frames. ], batch size: 55, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:19:49,554 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-10 22:19:58,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=776350.0, ans=0.1 2024-08-10 22:20:26,821 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-10 22:20:53,186 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 22:21:03,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5200, loss[loss=0.1025, beats_loss=0.01118, ecapa_loss=0.000199, whisper_loss=0.08933, over 22362.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01176, ecapa_loss=0.0002206, whisper_loss=0.09418, over 3890496.69 frames. ], batch size: 88, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:21:28,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-08-10 22:21:32,998 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-10 22:21:35,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=776750.0, ans=0.125 2024-08-10 22:21:44,337 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-10 22:21:46,219 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 22:22:11,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.788e+01 3.076e+01 3.692e+01 5.822e+01, threshold=6.152e+01, percent-clipped=0.0 2024-08-10 22:22:42,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5250, loss[loss=0.124, beats_loss=0.01008, ecapa_loss=0.0002432, whisper_loss=0.1114, over 22204.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01164, ecapa_loss=0.0002211, whisper_loss=0.09465, over 3905606.33 frames. ], batch size: 89, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:22:58,228 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 17 from LS+wenet, 30 from Vox, 47 fro AS 2024-08-10 22:23:07,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=777250.0, ans=0.2 2024-08-10 22:23:24,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=777250.0, ans=0.125 2024-08-10 22:23:39,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-10 22:23:45,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=777350.0, ans=0.2 2024-08-10 22:23:47,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2024-08-10 22:23:49,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-10 22:23:52,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777450.0, ans=0.1 2024-08-10 22:23:52,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2024-08-10 22:24:28,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=777550.0, ans=0.125 2024-08-10 22:24:35,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=777650.0, ans=0.2 2024-08-10 22:24:38,053 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5300, loss[loss=0.1086, beats_loss=0.01259, ecapa_loss=0.0001959, whisper_loss=0.094, over 18408.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01157, ecapa_loss=0.0002218, whisper_loss=0.09485, over 3874323.66 frames. ], batch size: 71, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:26:02,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.845e+01 3.130e+01 3.652e+01 5.218e+01, threshold=6.259e+01, percent-clipped=0.0 2024-08-10 22:26:13,618 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-10 22:26:31,791 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 22:26:38,968 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5350, loss[loss=0.1076, beats_loss=0.01169, ecapa_loss=0.0002388, whisper_loss=0.09352, over 16149.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01162, ecapa_loss=0.000221, whisper_loss=0.09407, over 3864240.78 frames. ], batch size: 66, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:26:39,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2024-08-10 22:26:41,618 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 22:27:26,040 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-10 22:27:53,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=778450.0, ans=0.1 2024-08-10 22:28:08,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=778450.0, ans=0.0 2024-08-10 22:28:12,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=778550.0, ans=0.125 2024-08-10 22:28:13,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-10 22:28:27,629 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5400, loss[loss=0.0871, beats_loss=0.01153, ecapa_loss=0.0001837, whisper_loss=0.07372, over 16090.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0116, ecapa_loss=0.0002211, whisper_loss=0.09447, over 3888913.73 frames. ], batch size: 63, lr: 1.06e-02, grad_scale: 8796093022208.0 2024-08-10 22:28:57,147 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-10 22:28:58,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=778850.0, ans=0.125 2024-08-10 22:29:09,557 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 35 from Vox, 36 fro AS 2024-08-10 22:29:12,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=778850.0, ans=0.125 2024-08-10 22:29:24,780 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.711e+01 3.100e+01 3.573e+01 5.377e+01, threshold=6.200e+01, percent-clipped=0.0 2024-08-10 22:29:29,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778950.0, ans=0.0 2024-08-10 22:29:38,321 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-10 22:29:50,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5450, loss[loss=0.09599, beats_loss=0.01196, ecapa_loss=0.0002263, whisper_loss=0.08177, over 22121.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01165, ecapa_loss=0.0002217, whisper_loss=0.09362, over 3891735.96 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:29:50,965 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-10 22:30:05,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.18 vs. limit=15.0 2024-08-10 22:30:20,098 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 22:30:20,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779250.0, ans=0.1 2024-08-10 22:30:24,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=779250.0, ans=0.125 2024-08-10 22:30:53,289 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-10 22:30:53,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=779450.0, ans=0.025 2024-08-10 22:31:05,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=779550.0, ans=0.1 2024-08-10 22:31:05,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=779550.0, ans=0.125 2024-08-10 22:31:16,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-08-10 22:31:17,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779550.0, ans=0.1 2024-08-10 22:31:23,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5500, loss[loss=0.1147, beats_loss=0.008225, ecapa_loss=0.0002439, whisper_loss=0.104, over 18077.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01159, ecapa_loss=0.0002232, whisper_loss=0.09417, over 3886136.00 frames. ], batch size: 71, lr: 1.05e-02, grad_scale: 8796093022208.0 2024-08-10 22:31:24,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779650.0, ans=0.1 2024-08-10 22:31:48,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-08-10 22:31:49,883 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-10 22:32:29,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+01 2.640e+01 3.152e+01 3.892e+01 6.209e+01, threshold=6.304e+01, percent-clipped=1.0 2024-08-10 22:32:34,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=779950.0, ans=0.05 2024-08-10 22:32:35,436 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-10 22:32:41,628 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-10 22:32:41,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=780050.0, ans=0.04949747468305833 2024-08-10 22:32:58,523 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5550, loss[loss=0.1164, beats_loss=0.01148, ecapa_loss=0.0002412, whisper_loss=0.1025, over 22850.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01168, ecapa_loss=0.0002224, whisper_loss=0.09382, over 3888518.99 frames. ], batch size: 94, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:33:02,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=780150.0, ans=0.125 2024-08-10 22:33:05,307 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 22:33:18,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780250.0, ans=0.1 2024-08-10 22:33:33,607 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-10 22:33:36,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.36 vs. limit=22.5 2024-08-10 22:33:38,151 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-10 22:33:52,470 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-10 22:34:26,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-10 22:34:31,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=780650.0, ans=0.1 2024-08-10 22:34:33,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5600, loss[loss=0.1092, beats_loss=0.01009, ecapa_loss=0.0002333, whisper_loss=0.0968, over 22378.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01165, ecapa_loss=0.0002215, whisper_loss=0.09445, over 3929913.07 frames. ], batch size: 89, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:34:33,269 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-10 22:34:55,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=780750.0, ans=0.125 2024-08-10 22:35:09,494 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 22:35:29,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=780950.0, ans=0.0 2024-08-10 22:35:36,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.826e+01 3.158e+01 3.731e+01 5.525e+01, threshold=6.316e+01, percent-clipped=0.0 2024-08-10 22:35:49,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=781050.0, ans=0.0 2024-08-10 22:36:04,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5650, loss[loss=0.1206, beats_loss=0.01145, ecapa_loss=0.0002479, whisper_loss=0.1067, over 18647.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01176, ecapa_loss=0.0002215, whisper_loss=0.09373, over 3914579.31 frames. ], batch size: 75, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:36:04,331 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-10 22:36:35,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=15.0 2024-08-10 22:36:37,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=781250.0, ans=0.125 2024-08-10 22:37:07,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-08-10 22:37:30,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=781550.0, ans=0.0 2024-08-10 22:37:34,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.35 vs. limit=10.0 2024-08-10 22:37:35,105 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5700, loss[loss=0.1027, beats_loss=0.009934, ecapa_loss=0.0002678, whisper_loss=0.09008, over 17352.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0118, ecapa_loss=0.0002231, whisper_loss=0.09315, over 3933201.31 frames. ], batch size: 73, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:37:42,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=781650.0, ans=0.0 2024-08-10 22:37:50,400 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-10 22:38:35,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=781950.0, ans=0.0 2024-08-10 22:38:36,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=781950.0, ans=0.125 2024-08-10 22:38:38,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=781950.0, ans=0.125 2024-08-10 22:38:40,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+01 2.917e+01 3.187e+01 3.836e+01 6.311e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 22:38:59,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=782050.0, ans=0.125 2024-08-10 22:39:06,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5750, loss[loss=0.08596, beats_loss=0.009884, ecapa_loss=0.0002634, whisper_loss=0.07344, over 13176.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01174, ecapa_loss=0.0002244, whisper_loss=0.09355, over 3898536.86 frames. ], batch size: 55, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:39:13,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=782150.0, ans=0.125 2024-08-10 22:39:18,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=782150.0, ans=0.125 2024-08-10 22:39:31,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=12.0 2024-08-10 22:39:39,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=782250.0, ans=0.0 2024-08-10 22:39:47,528 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-10 22:39:56,813 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-10 22:40:08,906 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 22:40:10,623 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-10 22:40:33,557 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 29 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-10 22:40:33,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=782550.0, ans=0.035 2024-08-10 22:40:39,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5800, loss[loss=0.1083, beats_loss=0.01032, ecapa_loss=0.0002486, whisper_loss=0.0955, over 22591.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01171, ecapa_loss=0.0002235, whisper_loss=0.09414, over 3935129.29 frames. ], batch size: 92, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:40:43,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=782650.0, ans=0.125 2024-08-10 22:40:45,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2024-08-10 22:40:56,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=782750.0, ans=0.0 2024-08-10 22:41:04,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=782750.0, ans=0.125 2024-08-10 22:41:13,985 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 22:41:17,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=782850.0, ans=0.2 2024-08-10 22:41:19,746 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 22:41:21,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782850.0, ans=0.1 2024-08-10 22:41:34,409 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-10 22:41:44,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.672e+01 3.034e+01 3.531e+01 4.962e+01, threshold=6.068e+01, percent-clipped=0.0 2024-08-10 22:42:12,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5850, loss[loss=0.1138, beats_loss=0.01218, ecapa_loss=0.000207, whisper_loss=0.09956, over 19162.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01177, ecapa_loss=0.0002223, whisper_loss=0.09365, over 3906891.64 frames. ], batch size: 75, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:42:59,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2024-08-10 22:43:18,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=783450.0, ans=0.2 2024-08-10 22:43:19,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=783450.0, ans=0.07 2024-08-10 22:43:31,075 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-10 22:43:32,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=783550.0, ans=0.0 2024-08-10 22:43:36,106 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 22:43:41,771 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5900, loss[loss=0.09748, beats_loss=0.008821, ecapa_loss=0.0002883, whisper_loss=0.08578, over 16101.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01174, ecapa_loss=0.0002232, whisper_loss=0.09392, over 3882825.04 frames. ], batch size: 66, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:43:53,586 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-10 22:44:10,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=783750.0, ans=0.07 2024-08-10 22:44:12,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=783750.0, ans=0.125 2024-08-10 22:44:34,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=783850.0, ans=0.07 2024-08-10 22:44:41,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=783950.0, ans=0.2 2024-08-10 22:44:47,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.739e+01 3.064e+01 3.610e+01 4.850e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-10 22:44:48,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=12.0 2024-08-10 22:44:55,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=784050.0, ans=0.125 2024-08-10 22:45:15,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 5950, loss[loss=0.07997, beats_loss=0.01414, ecapa_loss=0.0002481, whisper_loss=0.06335, over 21186.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01178, ecapa_loss=0.000223, whisper_loss=0.09356, over 3897902.36 frames. ], batch size: 93, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:45:42,126 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-10 22:45:44,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=784250.0, ans=0.2 2024-08-10 22:46:08,368 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-10 22:46:17,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2024-08-10 22:46:20,490 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-10 22:46:40,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=15.0 2024-08-10 22:46:41,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=784550.0, ans=0.0 2024-08-10 22:46:46,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6000, loss[loss=0.08577, beats_loss=0.01463, ecapa_loss=0.0001684, whisper_loss=0.06946, over 18273.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01168, ecapa_loss=0.0002241, whisper_loss=0.09453, over 3936813.66 frames. ], batch size: 72, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:46:46,112 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-10 22:47:25,488 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on ASR_libri: loss=0.2592, beats_loss=0, ecapa_loss=0.0006893, whisper_loss=0.2523, over 922467.00 frames. 2024-08-10 22:47:44,056 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on SV_voxceleb1: loss=0.005715, beats_loss=0, ecapa_loss=0.0005715, whisper_loss=0, over 939242.00 frames. 2024-08-10 22:48:33,350 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0750, 1.5440, 2.1686, 2.2290], device='cuda:0') 2024-08-10 22:49:35,336 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on AT_audioset: loss=0.02616, beats_loss=0.02616, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-10 22:49:35,347 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-10 22:49:47,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=784650.0, ans=0.0 2024-08-10 22:50:07,560 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 22:50:09,159 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-10 22:50:11,307 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.394e-02 2024-08-10 22:50:28,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2024-08-10 22:50:33,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.532e+01 2.986e+01 3.661e+01 5.128e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-10 22:51:00,342 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6050, loss[loss=0.1262, beats_loss=0.009782, ecapa_loss=0.0002271, whisper_loss=0.1142, over 23010.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01166, ecapa_loss=0.0002233, whisper_loss=0.09463, over 3909331.68 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:51:00,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=785150.0, ans=0.0 2024-08-10 22:51:18,653 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-10 22:51:44,287 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-10 22:51:55,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.47 vs. limit=15.0 2024-08-10 22:52:02,605 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-10 22:52:16,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-08-10 22:52:18,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2024-08-10 22:52:28,414 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-10 22:52:35,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6100, loss[loss=0.09662, beats_loss=0.01247, ecapa_loss=0.000253, whisper_loss=0.08163, over 14026.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01163, ecapa_loss=0.0002251, whisper_loss=0.09436, over 3888489.85 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:52:56,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2024-08-10 22:53:04,044 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-10 22:53:04,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=785750.0, ans=0.0 2024-08-10 22:53:15,019 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-10 22:53:25,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785850.0, ans=0.1 2024-08-10 22:53:29,496 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-10 22:53:29,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=785950.0, ans=0.125 2024-08-10 22:53:31,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2024-08-10 22:53:35,517 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-10 22:53:36,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.867e+01 3.222e+01 3.705e+01 5.709e+01, threshold=6.445e+01, percent-clipped=0.0 2024-08-10 22:53:43,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=785950.0, ans=0.0 2024-08-10 22:53:58,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=786050.0, ans=0.0 2024-08-10 22:54:05,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6150, loss[loss=0.1089, beats_loss=0.01148, ecapa_loss=0.0002326, whisper_loss=0.09511, over 23483.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01163, ecapa_loss=0.0002252, whisper_loss=0.09446, over 3907989.96 frames. ], batch size: 96, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:54:19,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=786150.0, ans=0.125 2024-08-10 22:54:28,697 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-10 22:54:35,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=786250.0, ans=10.0 2024-08-10 22:55:09,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2024-08-10 22:55:12,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=786450.0, ans=0.125 2024-08-10 22:55:32,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6200, loss[loss=0.08684, beats_loss=0.01585, ecapa_loss=0.0001641, whisper_loss=0.06935, over 20917.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01167, ecapa_loss=0.0002232, whisper_loss=0.09445, over 3927213.02 frames. ], batch size: 87, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:55:36,324 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-10 22:55:39,241 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 22:55:40,936 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 22:55:49,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=786750.0, ans=0.0 2024-08-10 22:55:51,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=786750.0, ans=0.1 2024-08-10 22:56:17,011 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-10 22:56:28,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=786950.0, ans=0.04949747468305833 2024-08-10 22:56:30,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=786950.0, ans=0.125 2024-08-10 22:56:31,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.713e+01 3.021e+01 3.323e+01 5.362e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-10 22:56:38,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-08-10 22:56:41,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=787050.0, ans=0.1 2024-08-10 22:56:57,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6250, loss[loss=0.1001, beats_loss=0.012, ecapa_loss=0.0001907, whisper_loss=0.08617, over 21585.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01156, ecapa_loss=0.0002232, whisper_loss=0.09472, over 3916695.84 frames. ], batch size: 86, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:57:33,759 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 22:57:47,437 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-10 22:57:47,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=787450.0, ans=0.125 2024-08-10 22:58:05,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=787550.0, ans=0.1 2024-08-10 22:58:12,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=787550.0, ans=0.125 2024-08-10 22:58:20,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=787650.0, ans=0.0 2024-08-10 22:58:20,967 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6300, loss[loss=0.1243, beats_loss=0.01139, ecapa_loss=0.0002191, whisper_loss=0.1107, over 23293.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01158, ecapa_loss=0.0002217, whisper_loss=0.09514, over 3899201.24 frames. ], batch size: 93, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 22:59:13,914 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-10 22:59:14,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787950.0, ans=0.125 2024-08-10 22:59:14,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=787950.0, ans=0.2 2024-08-10 22:59:14,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2024-08-10 22:59:19,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.181e+01 2.845e+01 3.065e+01 3.583e+01 5.394e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-10 22:59:25,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=787950.0, ans=0.125 2024-08-10 22:59:28,722 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-10 22:59:29,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2024-08-10 22:59:44,791 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6350, loss[loss=0.1035, beats_loss=0.01292, ecapa_loss=0.0001662, whisper_loss=0.08895, over 22855.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01165, ecapa_loss=0.0002206, whisper_loss=0.09475, over 3911867.21 frames. ], batch size: 91, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:00:01,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-10 23:00:38,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=788450.0, ans=0.09899494936611666 2024-08-10 23:00:46,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=788450.0, ans=0.2 2024-08-10 23:01:09,150 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6400, loss[loss=0.1309, beats_loss=0.01095, ecapa_loss=0.0002147, whisper_loss=0.1178, over 22753.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01162, ecapa_loss=0.0002203, whisper_loss=0.09555, over 3913413.20 frames. ], batch size: 90, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:01:16,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=788650.0, ans=0.0 2024-08-10 23:01:26,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=788750.0, ans=0.125 2024-08-10 23:01:40,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=788850.0, ans=0.0 2024-08-10 23:01:50,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=788850.0, ans=0.125 2024-08-10 23:02:07,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.821e+01 3.135e+01 3.560e+01 4.755e+01, threshold=6.269e+01, percent-clipped=0.0 2024-08-10 23:02:23,431 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 23:02:31,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-10 23:02:32,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6450, loss[loss=0.1251, beats_loss=0.00711, ecapa_loss=0.0002487, whisper_loss=0.1155, over 17729.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.0116, ecapa_loss=0.0002204, whisper_loss=0.09556, over 3909062.09 frames. ], batch size: 66, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:02:46,762 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-10 23:03:02,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-10 23:03:11,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=789350.0, ans=0.125 2024-08-10 23:03:13,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789350.0, ans=0.1 2024-08-10 23:03:24,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=789450.0, ans=0.0 2024-08-10 23:03:50,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-10 23:03:54,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6500, loss[loss=0.116, beats_loss=0.01078, ecapa_loss=0.0002179, whisper_loss=0.103, over 18306.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01153, ecapa_loss=0.0002216, whisper_loss=0.09627, over 3926762.21 frames. ], batch size: 72, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:04:33,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=789850.0, ans=0.125 2024-08-10 23:04:52,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=789950.0, ans=0.1 2024-08-10 23:04:55,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.163e+01 2.899e+01 3.223e+01 3.887e+01 5.763e+01, threshold=6.447e+01, percent-clipped=0.0 2024-08-10 23:05:02,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=790050.0, ans=0.0 2024-08-10 23:05:06,041 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:05:19,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6550, loss[loss=0.1055, beats_loss=0.01271, ecapa_loss=0.0002506, whisper_loss=0.09024, over 21718.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01162, ecapa_loss=0.0002227, whisper_loss=0.09568, over 3938771.43 frames. ], batch size: 91, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:05:54,009 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 23:05:54,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=790350.0, ans=0.0 2024-08-10 23:06:18,995 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-10 23:06:42,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6600, loss[loss=0.1362, beats_loss=0.00989, ecapa_loss=0.0002201, whisper_loss=0.1241, over 23560.00 frames. ], tot_loss[loss=0.11, beats_loss=0.01156, ecapa_loss=0.0002231, whisper_loss=0.09622, over 3970845.76 frames. ], batch size: 93, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:06:43,520 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-10 23:06:49,762 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:07:02,577 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-10 23:07:07,400 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-10 23:07:20,267 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 23:07:30,914 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-10 23:07:38,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.957e+01 3.211e+01 3.827e+01 6.878e+01, threshold=6.422e+01, percent-clipped=2.0 2024-08-10 23:07:44,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=790950.0, ans=0.1 2024-08-10 23:07:54,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=791050.0, ans=0.0 2024-08-10 23:08:02,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6650, loss[loss=0.1158, beats_loss=0.009001, ecapa_loss=0.0002331, whisper_loss=0.1045, over 15041.00 frames. ], tot_loss[loss=0.1105, beats_loss=0.0115, ecapa_loss=0.0002227, whisper_loss=0.09682, over 3968010.40 frames. ], batch size: 59, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:08:05,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=791150.0, ans=0.015 2024-08-10 23:08:32,696 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 41 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-10 23:08:41,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=791350.0, ans=0.125 2024-08-10 23:09:03,690 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.685e-02 2024-08-10 23:09:11,470 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-10 23:09:12,686 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-10 23:09:22,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=791650.0, ans=0.125 2024-08-10 23:09:22,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6700, loss[loss=0.1137, beats_loss=0.01113, ecapa_loss=0.0001731, whisper_loss=0.1008, over 16946.00 frames. ], tot_loss[loss=0.1104, beats_loss=0.01148, ecapa_loss=0.0002213, whisper_loss=0.09676, over 3945116.01 frames. ], batch size: 64, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:10:01,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=791850.0, ans=0.125 2024-08-10 23:10:02,500 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 23:10:12,456 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 23:10:15,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.716e+01 3.104e+01 3.606e+01 5.024e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-10 23:10:17,991 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-10 23:10:32,458 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-10 23:10:32,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=792050.0, ans=0.125 2024-08-10 23:10:33,822 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 23:10:34,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=792050.0, ans=0.125 2024-08-10 23:10:37,754 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6750, loss[loss=0.09316, beats_loss=0.01252, ecapa_loss=0.0002116, whisper_loss=0.07853, over 17468.00 frames. ], tot_loss[loss=0.1098, beats_loss=0.01155, ecapa_loss=0.0002204, whisper_loss=0.09605, over 3927372.38 frames. ], batch size: 69, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:10:44,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=792150.0, ans=0.1 2024-08-10 23:10:47,515 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-10 23:10:57,545 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 23:11:18,170 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-10 23:11:27,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-10 23:11:30,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=792450.0, ans=0.0 2024-08-10 23:11:45,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=792550.0, ans=0.125 2024-08-10 23:11:51,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=792550.0, ans=0.125 2024-08-10 23:11:54,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6800, loss[loss=0.1094, beats_loss=0.00929, ecapa_loss=0.0002171, whisper_loss=0.09791, over 18078.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01151, ecapa_loss=0.0002216, whisper_loss=0.09562, over 3931535.59 frames. ], batch size: 70, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:12:23,893 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-10 23:12:40,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-10 23:12:42,740 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-10 23:12:44,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=792950.0, ans=0.125 2024-08-10 23:12:44,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=792950.0, ans=0.125 2024-08-10 23:12:46,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.319e+01 2.823e+01 3.224e+01 3.746e+01 6.225e+01, threshold=6.449e+01, percent-clipped=1.0 2024-08-10 23:13:00,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=793050.0, ans=0.125 2024-08-10 23:13:04,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=793050.0, ans=0.125 2024-08-10 23:13:08,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2024-08-10 23:13:10,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6850, loss[loss=0.1242, beats_loss=0.009105, ecapa_loss=0.0002529, whisper_loss=0.1126, over 17294.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01145, ecapa_loss=0.000223, whisper_loss=0.09534, over 3894706.70 frames. ], batch size: 69, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:13:14,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=793150.0, ans=0.125 2024-08-10 23:13:41,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=793350.0, ans=0.125 2024-08-10 23:13:42,493 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-10 23:13:44,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=793350.0, ans=0.0 2024-08-10 23:14:04,507 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 23:14:12,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=793550.0, ans=0.125 2024-08-10 23:14:19,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-10 23:14:28,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6900, loss[loss=0.1022, beats_loss=0.01343, ecapa_loss=0.0002237, whisper_loss=0.08656, over 22591.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01159, ecapa_loss=0.0002223, whisper_loss=0.09488, over 3890676.25 frames. ], batch size: 91, lr: 1.05e-02, grad_scale: 17592186044416.0 2024-08-10 23:14:31,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-08-10 23:14:41,700 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-10 23:14:50,814 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-10 23:15:00,167 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-10 23:15:03,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=793850.0, ans=0.125 2024-08-10 23:15:12,967 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-10 23:15:13,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=793950.0, ans=0.0 2024-08-10 23:15:20,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.839e+01 3.157e+01 3.612e+01 7.302e+01, threshold=6.314e+01, percent-clipped=1.0 2024-08-10 23:15:29,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=794050.0, ans=0.125 2024-08-10 23:15:35,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=794050.0, ans=0.125 2024-08-10 23:15:37,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=794050.0, ans=0.125 2024-08-10 23:15:37,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2024-08-10 23:15:39,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=794050.0, ans=0.125 2024-08-10 23:15:42,740 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 6950, loss[loss=0.1181, beats_loss=0.01141, ecapa_loss=0.0002466, whisper_loss=0.1042, over 22751.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0116, ecapa_loss=0.000221, whisper_loss=0.095, over 3887061.16 frames. ], batch size: 92, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:16:10,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=794350.0, ans=0.125 2024-08-10 23:16:14,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=794350.0, ans=0.05 2024-08-10 23:16:19,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-10 23:16:27,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794450.0, ans=0.1 2024-08-10 23:16:33,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=794450.0, ans=0.0 2024-08-10 23:16:38,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=794450.0, ans=0.0 2024-08-10 23:16:40,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=794550.0, ans=0.07 2024-08-10 23:16:52,633 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 40 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-10 23:16:57,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7000, loss[loss=0.09638, beats_loss=0.009562, ecapa_loss=0.0002461, whisper_loss=0.08436, over 15079.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01162, ecapa_loss=0.0002205, whisper_loss=0.09471, over 3882165.02 frames. ], batch size: 61, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:17:06,735 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-10 23:17:09,326 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 23:17:39,596 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-10 23:17:49,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.681e+01 2.983e+01 3.369e+01 6.385e+01, threshold=5.967e+01, percent-clipped=1.0 2024-08-10 23:17:52,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-08-10 23:18:12,512 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7050, loss[loss=0.09317, beats_loss=0.01344, ecapa_loss=0.000217, whisper_loss=0.07756, over 17271.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01165, ecapa_loss=0.0002206, whisper_loss=0.09342, over 3873688.37 frames. ], batch size: 70, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:18:23,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=795150.0, ans=0.0 2024-08-10 23:18:37,139 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 23:18:45,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=795350.0, ans=0.2 2024-08-10 23:18:46,627 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-10 23:18:52,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795350.0, ans=0.125 2024-08-10 23:19:00,371 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-10 23:19:28,827 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7100, loss[loss=0.09904, beats_loss=0.01162, ecapa_loss=0.0002254, whisper_loss=0.08517, over 22136.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01166, ecapa_loss=0.0002193, whisper_loss=0.09355, over 3880069.48 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:19:29,077 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 23:19:32,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=795650.0, ans=15.0 2024-08-10 23:19:34,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-10 23:19:38,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=795650.0, ans=0.05 2024-08-10 23:19:40,727 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-10 23:20:10,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=795850.0, ans=0.1 2024-08-10 23:20:21,761 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-10 23:20:22,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.587e+01 2.924e+01 3.368e+01 5.025e+01, threshold=5.848e+01, percent-clipped=0.0 2024-08-10 23:20:30,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.62 vs. limit=15.0 2024-08-10 23:20:32,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=796050.0, ans=0.125 2024-08-10 23:20:37,931 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 23:20:46,496 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7150, loss[loss=0.119, beats_loss=0.01108, ecapa_loss=0.0002462, whisper_loss=0.1054, over 17884.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01168, ecapa_loss=0.0002205, whisper_loss=0.09341, over 3872751.05 frames. ], batch size: 72, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:21:08,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=796250.0, ans=0.1 2024-08-10 23:21:15,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=796350.0, ans=0.0 2024-08-10 23:21:25,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-10 23:21:30,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=796450.0, ans=0.2 2024-08-10 23:21:31,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2024-08-10 23:21:32,069 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-10 23:21:32,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=796450.0, ans=0.1 2024-08-10 23:21:33,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796450.0, ans=0.1 2024-08-10 23:21:38,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=796450.0, ans=10.0 2024-08-10 23:22:00,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7200, loss[loss=0.09165, beats_loss=0.01277, ecapa_loss=0.0002, whisper_loss=0.07688, over 17914.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01174, ecapa_loss=0.0002203, whisper_loss=0.09314, over 3888537.39 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:22:09,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=796650.0, ans=0.2 2024-08-10 23:22:30,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=796750.0, ans=0.125 2024-08-10 23:22:44,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=12.0 2024-08-10 23:22:53,696 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 23:22:54,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.940e+01 3.360e+01 3.850e+01 6.660e+01, threshold=6.719e+01, percent-clipped=3.0 2024-08-10 23:23:01,312 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-10 23:23:05,755 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-10 23:23:17,507 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7250, loss[loss=0.12, beats_loss=0.01028, ecapa_loss=0.000188, whisper_loss=0.1078, over 23242.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01165, ecapa_loss=0.0002195, whisper_loss=0.09398, over 3917418.81 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:23:23,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=797150.0, ans=0.125 2024-08-10 23:23:48,403 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 15 from Vox, 53 fro AS 2024-08-10 23:23:48,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=797350.0, ans=0.2 2024-08-10 23:23:53,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=797350.0, ans=0.1 2024-08-10 23:23:58,604 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.859e-02 2024-08-10 23:24:00,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=797350.0, ans=0.0 2024-08-10 23:24:00,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=797350.0, ans=0.125 2024-08-10 23:24:19,039 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 23:24:21,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-10 23:24:27,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=797550.0, ans=0.125 2024-08-10 23:24:30,268 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.046e-01 2024-08-10 23:24:31,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7300, loss[loss=0.1112, beats_loss=0.01326, ecapa_loss=0.0002373, whisper_loss=0.09553, over 20337.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01169, ecapa_loss=0.0002181, whisper_loss=0.09464, over 3933799.28 frames. ], batch size: 84, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:24:55,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=797750.0, ans=0.0 2024-08-10 23:24:56,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=797750.0, ans=0.125 2024-08-10 23:25:03,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=797850.0, ans=0.125 2024-08-10 23:25:21,565 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-10 23:25:22,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.736e+01 3.135e+01 3.639e+01 8.330e+01, threshold=6.270e+01, percent-clipped=2.0 2024-08-10 23:25:23,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=797950.0, ans=0.1 2024-08-10 23:25:35,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=798050.0, ans=0.1 2024-08-10 23:25:43,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7350, loss[loss=0.09311, beats_loss=0.01322, ecapa_loss=0.0002174, whisper_loss=0.07772, over 17615.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01169, ecapa_loss=0.0002189, whisper_loss=0.09464, over 3915869.22 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:25:52,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=798150.0, ans=0.125 2024-08-10 23:25:56,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=798250.0, ans=0.125 2024-08-10 23:26:08,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=798250.0, ans=0.125 2024-08-10 23:26:09,988 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-10 23:26:21,498 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-10 23:26:25,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=12.0 2024-08-10 23:26:27,775 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-10 23:26:54,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7400, loss[loss=0.1177, beats_loss=0.0112, ecapa_loss=0.0001975, whisper_loss=0.1045, over 20846.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01174, ecapa_loss=0.0002178, whisper_loss=0.09397, over 3902378.41 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:26:57,194 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-10 23:27:06,499 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-10 23:27:31,784 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-10 23:27:35,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=798950.0, ans=0.0 2024-08-10 23:27:41,799 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 23:27:42,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.647e+01 3.053e+01 3.534e+01 7.826e+01, threshold=6.106e+01, percent-clipped=2.0 2024-08-10 23:27:46,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-10 23:28:04,087 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7450, loss[loss=0.1262, beats_loss=0.01019, ecapa_loss=0.0003009, whisper_loss=0.113, over 20615.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01159, ecapa_loss=0.0002207, whisper_loss=0.0945, over 3895028.65 frames. ], batch size: 84, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:28:09,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=799150.0, ans=0.0 2024-08-10 23:28:20,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=799250.0, ans=0.125 2024-08-10 23:28:23,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=799250.0, ans=0.125 2024-08-10 23:28:42,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=799350.0, ans=0.125 2024-08-10 23:28:52,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=799450.0, ans=0.125 2024-08-10 23:28:57,531 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-10 23:29:00,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=799550.0, ans=0.07 2024-08-10 23:29:03,271 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-10 23:29:05,165 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-10 23:29:07,703 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-10 23:29:12,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7500, loss[loss=0.1172, beats_loss=0.00866, ecapa_loss=0.0002522, whisper_loss=0.1061, over 14926.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01161, ecapa_loss=0.0002218, whisper_loss=0.09401, over 3891362.82 frames. ], batch size: 57, lr: 1.04e-02, grad_scale: 17592186044416.0 2024-08-10 23:29:30,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=799750.0, ans=0.0 2024-08-10 23:29:32,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-10 23:29:41,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=799850.0, ans=0.07 2024-08-10 23:29:58,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799950.0, ans=0.1 2024-08-10 23:30:00,684 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-80000.pt 2024-08-10 23:30:04,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.160e+01 2.761e+01 3.186e+01 3.767e+01 5.987e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-10 23:30:06,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=799950.0, ans=0.125 2024-08-10 23:30:23,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=800050.0, ans=0.125 2024-08-10 23:30:25,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7550, loss[loss=0.111, beats_loss=0.01315, ecapa_loss=0.0002087, whisper_loss=0.09576, over 21233.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01164, ecapa_loss=0.0002205, whisper_loss=0.09396, over 3859747.10 frames. ], batch size: 84, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:30:26,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=800150.0, ans=0.07 2024-08-10 23:30:28,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2024-08-10 23:30:29,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=800150.0, ans=0.0 2024-08-10 23:30:32,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=12.0 2024-08-10 23:30:37,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=800150.0, ans=0.07 2024-08-10 23:31:00,116 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-10 23:31:01,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=800350.0, ans=0.09899494936611666 2024-08-10 23:31:08,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=800450.0, ans=0.035 2024-08-10 23:31:27,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=800550.0, ans=0.0 2024-08-10 23:31:29,964 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-10 23:31:38,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7600, loss[loss=0.1017, beats_loss=0.01127, ecapa_loss=0.0002215, whisper_loss=0.08823, over 17666.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0116, ecapa_loss=0.000221, whisper_loss=0.0947, over 3864704.36 frames. ], batch size: 69, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:31:46,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=800650.0, ans=0.025 2024-08-10 23:31:55,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-10 23:32:06,693 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-10 23:32:07,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2024-08-10 23:32:12,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=800850.0, ans=0.0 2024-08-10 23:32:15,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800850.0, ans=0.1 2024-08-10 23:32:22,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=800950.0, ans=0.125 2024-08-10 23:32:26,207 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-10 23:32:28,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=800950.0, ans=0.125 2024-08-10 23:32:30,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.785e+01 3.066e+01 3.767e+01 8.128e+01, threshold=6.132e+01, percent-clipped=1.0 2024-08-10 23:32:31,717 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-10 23:32:32,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2024-08-10 23:32:50,328 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-10 23:32:51,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7650, loss[loss=0.09441, beats_loss=0.01222, ecapa_loss=0.0001939, whisper_loss=0.08025, over 21118.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01151, ecapa_loss=0.0002212, whisper_loss=0.0954, over 3880276.70 frames. ], batch size: 83, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:32:57,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=801150.0, ans=0.2 2024-08-10 23:33:04,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=801250.0, ans=0.0 2024-08-10 23:33:09,801 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-10 23:33:11,148 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-10 23:33:16,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=801250.0, ans=0.0 2024-08-10 23:33:17,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=801350.0, ans=0.0 2024-08-10 23:33:19,219 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-10 23:33:19,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-10 23:33:36,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=801450.0, ans=0.0 2024-08-10 23:33:46,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-08-10 23:33:58,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=801550.0, ans=0.2 2024-08-10 23:34:01,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7700, loss[loss=0.1115, beats_loss=0.01039, ecapa_loss=0.0002122, whisper_loss=0.09894, over 22687.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01149, ecapa_loss=0.0002203, whisper_loss=0.09475, over 3875521.23 frames. ], batch size: 92, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:34:08,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=801650.0, ans=0.0 2024-08-10 23:34:09,796 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 38 from Vox, 32 fro AS 2024-08-10 23:34:36,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=801850.0, ans=0.2 2024-08-10 23:34:39,692 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 23:34:43,797 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-10 23:34:50,523 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.824e+01 3.342e+01 3.789e+01 5.468e+01, threshold=6.684e+01, percent-clipped=0.0 2024-08-10 23:35:11,068 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7750, loss[loss=0.1181, beats_loss=0.009263, ecapa_loss=0.0002356, whisper_loss=0.1065, over 22325.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01148, ecapa_loss=0.0002203, whisper_loss=0.09434, over 3861738.27 frames. ], batch size: 91, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:35:18,591 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-10 23:35:40,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=802350.0, ans=0.0 2024-08-10 23:35:54,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=802450.0, ans=0.125 2024-08-10 23:36:01,575 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 23:36:25,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7800, loss[loss=0.07761, beats_loss=0.01628, ecapa_loss=0.0001686, whisper_loss=0.05964, over 22823.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01154, ecapa_loss=0.0002197, whisper_loss=0.09428, over 3879443.37 frames. ], batch size: 94, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:36:44,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=802750.0, ans=0.1 2024-08-10 23:36:51,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=802750.0, ans=0.125 2024-08-10 23:36:54,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=802850.0, ans=0.125 2024-08-10 23:36:56,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=802850.0, ans=0.0 2024-08-10 23:37:07,455 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-10 23:37:08,955 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-10 23:37:10,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=15.0 2024-08-10 23:37:12,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-08-10 23:37:13,318 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-10 23:37:14,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.252e+01 2.895e+01 3.316e+01 3.988e+01 7.505e+01, threshold=6.631e+01, percent-clipped=2.0 2024-08-10 23:37:16,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=802950.0, ans=0.125 2024-08-10 23:37:17,325 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 30 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-10 23:37:35,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7850, loss[loss=0.1158, beats_loss=0.009532, ecapa_loss=0.0001935, whisper_loss=0.1043, over 18100.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01156, ecapa_loss=0.0002197, whisper_loss=0.09485, over 3894536.52 frames. ], batch size: 68, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:37:43,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=803150.0, ans=0.0 2024-08-10 23:38:24,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=803450.0, ans=0.2 2024-08-10 23:38:29,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=15.0 2024-08-10 23:38:33,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=803550.0, ans=0.04949747468305833 2024-08-10 23:38:37,898 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-10 23:38:43,652 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-10 23:38:47,402 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7900, loss[loss=0.1224, beats_loss=0.009631, ecapa_loss=0.00019, whisper_loss=0.1109, over 16708.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01165, ecapa_loss=0.0002194, whisper_loss=0.09386, over 3878069.98 frames. ], batch size: 63, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:38:47,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803650.0, ans=0.1 2024-08-10 23:38:58,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-10 23:39:12,002 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-10 23:39:37,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.898e+01 3.197e+01 3.826e+01 5.899e+01, threshold=6.393e+01, percent-clipped=0.0 2024-08-10 23:39:37,762 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-10 23:39:55,772 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 23:39:57,058 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-10 23:39:58,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 7950, loss[loss=0.1239, beats_loss=0.01199, ecapa_loss=0.0001806, whisper_loss=0.1101, over 16968.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01159, ecapa_loss=0.0002206, whisper_loss=0.0943, over 3874906.26 frames. ], batch size: 62, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:40:13,563 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 23:40:24,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804350.0, ans=0.125 2024-08-10 23:40:45,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=804450.0, ans=0.125 2024-08-10 23:40:49,646 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-10 23:40:52,335 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-10 23:40:56,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2024-08-10 23:41:05,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8000, loss[loss=0.1175, beats_loss=0.01163, ecapa_loss=0.0002251, whisper_loss=0.1036, over 22132.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01161, ecapa_loss=0.0002196, whisper_loss=0.09434, over 3864855.89 frames. ], batch size: 91, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:41:11,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-10 23:41:16,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=804650.0, ans=0.0 2024-08-10 23:41:24,231 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-10 23:41:28,744 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-10 23:41:28,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804750.0, ans=0.125 2024-08-10 23:41:35,156 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-10 23:41:43,278 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-10 23:41:46,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=804950.0, ans=0.125 2024-08-10 23:41:51,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.180e+01 2.694e+01 3.160e+01 3.631e+01 6.005e+01, threshold=6.321e+01, percent-clipped=0.0 2024-08-10 23:41:59,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=805050.0, ans=0.0 2024-08-10 23:42:04,120 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-10 23:42:05,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=805050.0, ans=0.125 2024-08-10 23:42:10,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=12.0 2024-08-10 23:42:11,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8050, loss[loss=0.1299, beats_loss=0.009609, ecapa_loss=0.0002173, whisper_loss=0.1182, over 22805.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01154, ecapa_loss=0.0002181, whisper_loss=0.09481, over 3879300.09 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:42:29,325 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-10 23:42:39,846 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-10 23:42:42,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=805350.0, ans=0.0 2024-08-10 23:42:45,102 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-10 23:42:56,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=805450.0, ans=0.125 2024-08-10 23:42:59,310 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-10 23:43:05,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=805550.0, ans=0.025 2024-08-10 23:43:18,202 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8100, loss[loss=0.1326, beats_loss=0.009516, ecapa_loss=0.0002175, whisper_loss=0.1209, over 22958.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01147, ecapa_loss=0.0002192, whisper_loss=0.09535, over 3867195.34 frames. ], batch size: 91, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:43:18,331 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-10 23:43:46,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=805850.0, ans=0.125 2024-08-10 23:43:54,942 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-10 23:44:00,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=805950.0, ans=0.07 2024-08-10 23:44:04,011 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.753e+01 3.174e+01 3.635e+01 5.123e+01, threshold=6.349e+01, percent-clipped=0.0 2024-08-10 23:44:11,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=806050.0, ans=0.0 2024-08-10 23:44:19,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=806050.0, ans=0.0 2024-08-10 23:44:20,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-08-10 23:44:24,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8150, loss[loss=0.08617, beats_loss=0.01301, ecapa_loss=0.0001616, whisper_loss=0.07154, over 19506.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01145, ecapa_loss=0.0002189, whisper_loss=0.09549, over 3920625.18 frames. ], batch size: 74, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:44:46,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2024-08-10 23:44:47,220 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-10 23:45:06,712 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-10 23:45:20,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.85 vs. limit=10.0 2024-08-10 23:45:25,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=806550.0, ans=0.0 2024-08-10 23:45:30,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8200, loss[loss=0.1089, beats_loss=0.01076, ecapa_loss=0.0002172, whisper_loss=0.09594, over 22468.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01145, ecapa_loss=0.00022, whisper_loss=0.09539, over 3921785.78 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:46:14,066 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-10 23:46:14,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-10 23:46:15,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=806950.0, ans=0.125 2024-08-10 23:46:16,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.206e+01 2.784e+01 3.124e+01 3.627e+01 5.044e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-10 23:46:16,525 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-10 23:46:18,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=806950.0, ans=0.125 2024-08-10 23:46:19,120 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 8 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 23:46:36,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8250, loss[loss=0.1201, beats_loss=0.01008, ecapa_loss=0.0002114, whisper_loss=0.1079, over 23240.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01155, ecapa_loss=0.0002194, whisper_loss=0.09512, over 3943266.36 frames. ], batch size: 90, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:46:40,145 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-10 23:46:48,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=807250.0, ans=0.0 2024-08-10 23:46:55,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=807250.0, ans=0.125 2024-08-10 23:46:58,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807250.0, ans=0.1 2024-08-10 23:47:04,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-10 23:47:05,105 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-10 23:47:08,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=807350.0, ans=0.125 2024-08-10 23:47:09,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=807350.0, ans=0.125 2024-08-10 23:47:09,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=807350.0, ans=0.125 2024-08-10 23:47:11,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.55 vs. limit=15.0 2024-08-10 23:47:11,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.09 vs. limit=15.0 2024-08-10 23:47:22,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=807450.0, ans=0.125 2024-08-10 23:47:42,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8300, loss[loss=0.1198, beats_loss=0.01019, ecapa_loss=0.000217, whisper_loss=0.1075, over 16766.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01162, ecapa_loss=0.0002183, whisper_loss=0.09496, over 3892953.01 frames. ], batch size: 65, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:48:01,365 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-10 23:48:29,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.802e+01 3.186e+01 3.664e+01 3.254e+02, threshold=6.372e+01, percent-clipped=4.0 2024-08-10 23:48:47,487 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 38 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-10 23:48:48,742 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8350, loss[loss=0.129, beats_loss=0.01112, ecapa_loss=0.0002559, whisper_loss=0.1153, over 23354.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01162, ecapa_loss=0.0002195, whisper_loss=0.09521, over 3903348.51 frames. ], batch size: 94, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:48:58,625 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-10 23:49:02,734 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-10 23:49:03,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.33 vs. limit=15.0 2024-08-10 23:49:09,719 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-10 23:49:30,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-08-10 23:49:34,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=808450.0, ans=0.2 2024-08-10 23:49:53,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8400, loss[loss=0.1237, beats_loss=0.0106, ecapa_loss=0.0001679, whisper_loss=0.1115, over 23895.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01153, ecapa_loss=0.0002218, whisper_loss=0.09562, over 3913548.91 frames. ], batch size: 88, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:49:56,443 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-10 23:50:00,205 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-10 23:50:16,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=808750.0, ans=0.1 2024-08-10 23:50:23,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=15.0 2024-08-10 23:50:25,300 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-10 23:50:30,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808850.0, ans=0.1 2024-08-10 23:50:37,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=808950.0, ans=0.125 2024-08-10 23:50:39,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.636e+01 3.039e+01 3.423e+01 5.250e+01, threshold=6.078e+01, percent-clipped=0.0 2024-08-10 23:50:46,272 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-10 23:50:59,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8450, loss[loss=0.1004, beats_loss=0.01237, ecapa_loss=0.0001658, whisper_loss=0.08637, over 17992.00 frames. ], tot_loss[loss=0.1097, beats_loss=0.01146, ecapa_loss=0.0002211, whisper_loss=0.096, over 3873202.12 frames. ], batch size: 66, lr: 1.04e-02, grad_scale: 35184372088832.0 2024-08-10 23:51:00,615 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-10 23:51:15,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=809250.0, ans=0.0 2024-08-10 23:51:18,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=809250.0, ans=0.125 2024-08-10 23:51:45,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=809450.0, ans=0.0 2024-08-10 23:51:46,200 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 23:51:50,634 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-10 23:52:06,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8500, loss[loss=0.1169, beats_loss=0.009308, ecapa_loss=0.0002416, whisper_loss=0.1051, over 22037.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01154, ecapa_loss=0.000221, whisper_loss=0.09533, over 3920284.10 frames. ], batch size: 87, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:52:36,711 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-10 23:52:43,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=809850.0, ans=15.0 2024-08-10 23:52:45,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=809850.0, ans=0.125 2024-08-10 23:52:50,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=809950.0, ans=0.125 2024-08-10 23:52:59,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+01 2.787e+01 3.102e+01 3.651e+01 5.135e+01, threshold=6.204e+01, percent-clipped=0.0 2024-08-10 23:53:00,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=809950.0, ans=0.125 2024-08-10 23:53:04,945 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-10 23:53:08,364 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-10 23:53:08,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=810050.0, ans=0.0 2024-08-10 23:53:11,637 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-10 23:53:13,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=810050.0, ans=0.2 2024-08-10 23:53:16,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2024-08-10 23:53:19,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=810050.0, ans=0.0 2024-08-10 23:53:21,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8550, loss[loss=0.1104, beats_loss=0.009625, ecapa_loss=0.0002289, whisper_loss=0.09845, over 19529.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01154, ecapa_loss=0.0002189, whisper_loss=0.0957, over 3918249.55 frames. ], batch size: 77, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:53:26,922 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-10 23:53:27,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=810150.0, ans=10.0 2024-08-10 23:53:34,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2024-08-10 23:53:51,989 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.076e-01 2024-08-10 23:54:04,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810450.0, ans=0.1 2024-08-10 23:54:09,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=810450.0, ans=0.125 2024-08-10 23:54:23,835 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-10 23:54:25,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-08-10 23:54:26,570 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-10 23:54:34,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8600, loss[loss=0.09378, beats_loss=0.009208, ecapa_loss=0.0002488, whisper_loss=0.08209, over 20327.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01154, ecapa_loss=0.0002183, whisper_loss=0.09582, over 3875579.61 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:54:39,391 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.594e-03 2024-08-10 23:54:40,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=810650.0, ans=0.125 2024-08-10 23:54:46,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=810650.0, ans=0.125 2024-08-10 23:54:55,827 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-10 23:54:57,273 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-10 23:54:58,914 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-10 23:55:10,122 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-10 23:55:12,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.02 vs. limit=6.0 2024-08-10 23:55:23,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.846e+01 3.382e+01 3.840e+01 6.128e+01, threshold=6.764e+01, percent-clipped=0.0 2024-08-10 23:55:24,143 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-10 23:55:31,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=811050.0, ans=0.125 2024-08-10 23:55:38,781 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-10 23:55:44,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8650, loss[loss=0.0875, beats_loss=0.01525, ecapa_loss=0.0001944, whisper_loss=0.0703, over 22127.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01163, ecapa_loss=0.0002185, whisper_loss=0.09558, over 3878105.87 frames. ], batch size: 93, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:55:56,203 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-10 23:56:34,067 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-10 23:56:43,287 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-10 23:56:49,106 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-10 23:56:54,092 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-10 23:56:55,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8700, loss[loss=0.09114, beats_loss=0.01239, ecapa_loss=0.0001772, whisper_loss=0.07698, over 14564.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01162, ecapa_loss=0.0002183, whisper_loss=0.09564, over 3879726.96 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:56:58,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=811650.0, ans=0.125 2024-08-10 23:57:17,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-08-10 23:57:33,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-08-10 23:57:43,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.694e+01 2.974e+01 3.412e+01 6.571e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-10 23:57:43,902 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-10 23:57:47,998 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-10 23:57:56,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=812050.0, ans=0.1 2024-08-10 23:57:58,943 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-10 23:58:03,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-10 23:58:04,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8750, loss[loss=0.118, beats_loss=0.01251, ecapa_loss=0.000201, whisper_loss=0.1034, over 17418.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01162, ecapa_loss=0.0002188, whisper_loss=0.09478, over 3863892.45 frames. ], batch size: 67, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:58:13,354 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 17 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-10 23:58:25,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=812250.0, ans=0.2 2024-08-10 23:58:31,608 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-10 23:58:31,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=812350.0, ans=0.125 2024-08-10 23:58:46,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.23 vs. limit=12.0 2024-08-10 23:59:10,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.12 vs. limit=22.5 2024-08-10 23:59:11,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2024-08-10 23:59:12,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8800, loss[loss=0.09813, beats_loss=0.0135, ecapa_loss=0.0002434, whisper_loss=0.0822, over 19366.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01172, ecapa_loss=0.0002185, whisper_loss=0.09382, over 3857057.89 frames. ], batch size: 82, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-10 23:59:30,506 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-10 23:59:45,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=812850.0, ans=0.0 2024-08-10 23:59:57,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=812950.0, ans=0.0 2024-08-10 23:59:59,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.902e+01 3.394e+01 3.776e+01 5.499e+01, threshold=6.788e+01, percent-clipped=0.0 2024-08-11 00:00:21,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8850, loss[loss=0.09937, beats_loss=0.01012, ecapa_loss=0.0002182, whisper_loss=0.08706, over 19245.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01175, ecapa_loss=0.0002185, whisper_loss=0.094, over 3888764.02 frames. ], batch size: 73, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:00:26,267 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.77 vs. limit=10.0 2024-08-11 00:00:35,859 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-11 00:00:37,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-11 00:00:39,712 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 00:00:46,478 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-11 00:00:48,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=813350.0, ans=0.125 2024-08-11 00:01:00,825 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 00:01:02,515 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 00:01:18,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-11 00:01:24,701 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:01:30,505 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8900, loss[loss=0.08891, beats_loss=0.01002, ecapa_loss=0.0002133, whisper_loss=0.07675, over 14176.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01173, ecapa_loss=0.0002202, whisper_loss=0.09375, over 3872834.23 frames. ], batch size: 53, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:01:37,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=813650.0, ans=0.0 2024-08-11 00:01:50,095 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-11 00:01:57,077 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 00:01:59,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813850.0, ans=0.125 2024-08-11 00:02:00,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=813850.0, ans=0.125 2024-08-11 00:02:18,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.661e+01 2.983e+01 3.454e+01 5.391e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 00:02:21,087 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-11 00:02:22,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=813950.0, ans=0.025 2024-08-11 00:02:27,688 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 00:02:37,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 8950, loss[loss=0.1058, beats_loss=0.01089, ecapa_loss=0.000241, whisper_loss=0.09246, over 21209.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01171, ecapa_loss=0.00022, whisper_loss=0.09286, over 3849976.59 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:02:41,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2024-08-11 00:03:03,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=814350.0, ans=0.125 2024-08-11 00:03:27,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814450.0, ans=0.125 2024-08-11 00:03:28,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=814450.0, ans=0.125 2024-08-11 00:03:31,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.04 vs. limit=10.0 2024-08-11 00:03:42,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.53 vs. limit=15.0 2024-08-11 00:03:44,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9000, loss[loss=0.09265, beats_loss=0.01405, ecapa_loss=0.0002393, whisper_loss=0.07621, over 15108.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01158, ecapa_loss=0.0002198, whisper_loss=0.09326, over 3821488.77 frames. ], batch size: 64, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:03:44,139 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 00:04:24,148 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on ASR_libri: loss=0.2598, beats_loss=0, ecapa_loss=0.0006942, whisper_loss=0.2529, over 922467.00 frames. 2024-08-11 00:04:43,483 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 00:06:37,823 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on AT_audioset: loss=0.02592, beats_loss=0.02592, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 00:06:37,827 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 00:06:41,986 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 00:06:45,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-11 00:06:48,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=814650.0, ans=0.125 2024-08-11 00:06:49,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814650.0, ans=0.1 2024-08-11 00:06:57,568 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 00:07:12,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=15.0 2024-08-11 00:07:27,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=814950.0, ans=0.025 2024-08-11 00:07:30,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.866e+01 3.382e+01 4.145e+01 7.682e+01, threshold=6.764e+01, percent-clipped=3.0 2024-08-11 00:07:37,467 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 00:07:49,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=815050.0, ans=0.07 2024-08-11 00:07:54,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9050, loss[loss=0.1183, beats_loss=0.01138, ecapa_loss=0.0001883, whisper_loss=0.105, over 23097.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01156, ecapa_loss=0.0002198, whisper_loss=0.09416, over 3856177.41 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:08:04,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-08-11 00:08:17,661 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 00:08:22,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815350.0, ans=0.125 2024-08-11 00:08:40,455 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 00:08:52,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=815550.0, ans=0.125 2024-08-11 00:08:53,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=815550.0, ans=0.2 2024-08-11 00:09:08,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9100, loss[loss=0.09309, beats_loss=0.01042, ecapa_loss=0.0002873, whisper_loss=0.0798, over 14895.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01164, ecapa_loss=0.00022, whisper_loss=0.09393, over 3884344.59 frames. ], batch size: 63, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:09:09,721 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 00:09:10,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-11 00:09:17,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=815650.0, ans=0.125 2024-08-11 00:09:25,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2024-08-11 00:09:32,603 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 00:09:44,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=815850.0, ans=0.2 2024-08-11 00:09:52,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=815950.0, ans=0.1 2024-08-11 00:09:58,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+01 2.712e+01 2.999e+01 3.385e+01 5.028e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 00:10:12,255 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 00:10:13,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2024-08-11 00:10:19,278 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.547e+00 2024-08-11 00:10:20,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9150, loss[loss=0.1125, beats_loss=0.01252, ecapa_loss=0.0002451, whisper_loss=0.09753, over 21834.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01164, ecapa_loss=0.0002208, whisper_loss=0.09397, over 3908849.70 frames. ], batch size: 88, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:10:50,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=816350.0, ans=0.0 2024-08-11 00:11:04,055 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 00:11:08,926 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 00:11:13,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=816450.0, ans=10.0 2024-08-11 00:11:17,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-08-11 00:11:36,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9200, loss[loss=0.09836, beats_loss=0.01191, ecapa_loss=0.000229, whisper_loss=0.08416, over 22135.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01161, ecapa_loss=0.0002208, whisper_loss=0.09449, over 3921220.81 frames. ], batch size: 94, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:11:51,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816750.0, ans=0.1 2024-08-11 00:11:51,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816750.0, ans=0.1 2024-08-11 00:11:58,308 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 00:12:06,099 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 00:12:06,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=816850.0, ans=0.125 2024-08-11 00:12:11,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=816850.0, ans=0.125 2024-08-11 00:12:27,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.632e+01 3.033e+01 3.497e+01 1.383e+02, threshold=6.066e+01, percent-clipped=1.0 2024-08-11 00:12:32,946 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.877e-02 2024-08-11 00:12:48,907 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9250, loss[loss=0.1162, beats_loss=0.01005, ecapa_loss=0.0001936, whisper_loss=0.1042, over 16572.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01155, ecapa_loss=0.0002208, whisper_loss=0.09451, over 3934319.84 frames. ], batch size: 62, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:12:52,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=817150.0, ans=0.0 2024-08-11 00:13:00,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=817150.0, ans=0.0 2024-08-11 00:13:14,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=817250.0, ans=0.125 2024-08-11 00:13:19,528 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 00:14:01,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2024-08-11 00:14:02,180 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-11 00:14:06,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9300, loss[loss=0.1183, beats_loss=0.01164, ecapa_loss=0.0001801, whisper_loss=0.1048, over 23834.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01155, ecapa_loss=0.0002209, whisper_loss=0.09485, over 3949730.15 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:14:16,878 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 00:14:58,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.667e+01 2.966e+01 3.383e+01 7.144e+01, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 00:14:59,984 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-11 00:15:09,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=818050.0, ans=0.1 2024-08-11 00:15:19,577 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9350, loss[loss=0.1145, beats_loss=0.01034, ecapa_loss=0.0002365, whisper_loss=0.1018, over 22346.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01148, ecapa_loss=0.0002231, whisper_loss=0.09509, over 3921280.92 frames. ], batch size: 91, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:15:36,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=818250.0, ans=0.125 2024-08-11 00:16:09,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=15.0 2024-08-11 00:16:11,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2024-08-11 00:16:31,180 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 00:16:32,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9400, loss[loss=0.1047, beats_loss=0.01188, ecapa_loss=0.0001909, whisper_loss=0.09088, over 23899.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01149, ecapa_loss=0.0002225, whisper_loss=0.09495, over 3906003.33 frames. ], batch size: 93, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:16:34,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-11 00:17:09,874 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 00:17:17,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-08-11 00:17:20,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.820e+01 3.162e+01 3.777e+01 5.486e+01, threshold=6.323e+01, percent-clipped=0.0 2024-08-11 00:17:27,569 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 00:17:41,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9450, loss[loss=0.1185, beats_loss=0.01172, ecapa_loss=0.0001949, whisper_loss=0.1048, over 23663.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01142, ecapa_loss=0.0002222, whisper_loss=0.09512, over 3898445.25 frames. ], batch size: 91, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:18:11,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=819350.0, ans=0.125 2024-08-11 00:18:24,834 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 00:18:35,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=819550.0, ans=0.125 2024-08-11 00:18:48,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-08-11 00:18:48,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9500, loss[loss=0.1011, beats_loss=0.01458, ecapa_loss=0.0002041, whisper_loss=0.08448, over 21962.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01145, ecapa_loss=0.0002217, whisper_loss=0.09491, over 3908701.99 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 35184372088832.0 2024-08-11 00:18:51,741 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 00:18:52,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=819650.0, ans=0.125 2024-08-11 00:18:54,380 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 00:19:07,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=819750.0, ans=0.125 2024-08-11 00:19:10,461 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 00:19:13,270 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 12 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 00:19:36,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=819950.0, ans=0.125 2024-08-11 00:19:37,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.201e+01 2.851e+01 3.283e+01 3.927e+01 7.522e+01, threshold=6.566e+01, percent-clipped=2.0 2024-08-11 00:19:51,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820050.0, ans=0.1 2024-08-11 00:19:58,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9550, loss[loss=0.1022, beats_loss=0.01076, ecapa_loss=0.0002247, whisper_loss=0.08923, over 21327.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01152, ecapa_loss=0.0002249, whisper_loss=0.09406, over 3860129.42 frames. ], batch size: 87, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:19:59,988 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 00:20:08,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.19 vs. limit=10.0 2024-08-11 00:20:15,147 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:20:15,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=820250.0, ans=0.0 2024-08-11 00:20:21,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=820250.0, ans=0.1 2024-08-11 00:20:27,661 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-11 00:20:28,885 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 00:20:36,932 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 00:20:54,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-11 00:20:56,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=820550.0, ans=0.125 2024-08-11 00:20:57,029 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 00:21:00,676 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 26 from Vox, 17 fro AS 2024-08-11 00:21:04,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9600, loss[loss=0.1051, beats_loss=0.01121, ecapa_loss=0.0002259, whisper_loss=0.09163, over 14594.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01142, ecapa_loss=0.0002254, whisper_loss=0.09458, over 3868260.84 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:21:05,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=820650.0, ans=0.2 2024-08-11 00:21:07,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=820650.0, ans=10.0 2024-08-11 00:21:28,422 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 00:21:50,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.676e+01 3.117e+01 3.565e+01 7.658e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 00:21:58,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=821050.0, ans=0.125 2024-08-11 00:22:04,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=821050.0, ans=0.95 2024-08-11 00:22:05,737 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.063e-02 2024-08-11 00:22:10,834 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9650, loss[loss=0.09842, beats_loss=0.0133, ecapa_loss=0.0002333, whisper_loss=0.08278, over 21752.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01159, ecapa_loss=0.0002251, whisper_loss=0.09323, over 3869828.70 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:22:19,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=821150.0, ans=0.125 2024-08-11 00:22:20,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=821150.0, ans=0.125 2024-08-11 00:22:31,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=821250.0, ans=0.125 2024-08-11 00:22:45,717 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-11 00:22:54,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821450.0, ans=0.1 2024-08-11 00:23:07,562 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 00:23:10,182 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 00:23:12,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=821550.0, ans=0.0 2024-08-11 00:23:16,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9700, loss[loss=0.1051, beats_loss=0.009119, ecapa_loss=0.000279, whisper_loss=0.09322, over 20591.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01151, ecapa_loss=0.0002252, whisper_loss=0.09393, over 3862533.41 frames. ], batch size: 88, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:23:39,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=821750.0, ans=0.0 2024-08-11 00:23:56,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=821950.0, ans=0.0 2024-08-11 00:24:02,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+01 2.803e+01 3.195e+01 3.718e+01 6.974e+01, threshold=6.391e+01, percent-clipped=1.0 2024-08-11 00:24:18,709 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-11 00:24:22,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9750, loss[loss=0.08701, beats_loss=0.01349, ecapa_loss=0.000197, whisper_loss=0.07156, over 22553.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01154, ecapa_loss=0.000225, whisper_loss=0.09351, over 3888143.79 frames. ], batch size: 94, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:24:46,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2024-08-11 00:24:47,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=822350.0, ans=0.0 2024-08-11 00:24:56,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-08-11 00:24:57,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822350.0, ans=0.1 2024-08-11 00:24:59,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=822450.0, ans=0.0 2024-08-11 00:25:07,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=8.0 2024-08-11 00:25:09,606 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 32 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 00:25:26,183 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9800, loss[loss=0.0892, beats_loss=0.01197, ecapa_loss=0.0002553, whisper_loss=0.07468, over 21073.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01164, ecapa_loss=0.0002221, whisper_loss=0.09316, over 3882119.54 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:25:26,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=822650.0, ans=0.0 2024-08-11 00:25:37,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=822650.0, ans=0.0 2024-08-11 00:25:41,118 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 00:25:49,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-08-11 00:25:51,663 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-11 00:26:00,785 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 00:26:07,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822950.0, ans=0.1 2024-08-11 00:26:11,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=822950.0, ans=0.125 2024-08-11 00:26:12,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.728e+01 3.058e+01 3.533e+01 7.097e+01, threshold=6.116e+01, percent-clipped=1.0 2024-08-11 00:26:16,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=822950.0, ans=0.125 2024-08-11 00:26:28,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-11 00:26:31,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9850, loss[loss=0.0943, beats_loss=0.01156, ecapa_loss=0.0002342, whisper_loss=0.0804, over 16344.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01159, ecapa_loss=0.0002221, whisper_loss=0.09417, over 3911322.32 frames. ], batch size: 66, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:26:35,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=823150.0, ans=0.0 2024-08-11 00:26:35,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=823150.0, ans=0.125 2024-08-11 00:26:46,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=823250.0, ans=0.125 2024-08-11 00:27:03,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=823350.0, ans=0.2 2024-08-11 00:27:25,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-08-11 00:27:29,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=823550.0, ans=0.2 2024-08-11 00:27:34,811 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 00:27:37,739 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9900, loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0002538, whisper_loss=0.08866, over 16069.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01149, ecapa_loss=0.0002234, whisper_loss=0.09451, over 3899075.38 frames. ], batch size: 65, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:27:38,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=823650.0, ans=0.0 2024-08-11 00:27:57,479 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 00:28:00,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=823750.0, ans=0.0 2024-08-11 00:28:01,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=823750.0, ans=0.1 2024-08-11 00:28:04,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=823850.0, ans=0.125 2024-08-11 00:28:05,249 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 00:28:09,192 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 00:28:11,768 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 00:28:14,420 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 00:28:23,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.723e+01 2.993e+01 3.476e+01 9.466e+01, threshold=5.985e+01, percent-clipped=1.0 2024-08-11 00:28:24,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=823950.0, ans=0.0 2024-08-11 00:28:29,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=824050.0, ans=0.0 2024-08-11 00:28:34,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.10 vs. limit=22.5 2024-08-11 00:28:37,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=824050.0, ans=0.125 2024-08-11 00:28:43,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 9950, loss[loss=0.1346, beats_loss=0.009017, ecapa_loss=0.0002278, whisper_loss=0.1233, over 18601.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01152, ecapa_loss=0.0002219, whisper_loss=0.09523, over 3902194.46 frames. ], batch size: 71, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:28:43,701 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 00:28:59,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824250.0, ans=0.1 2024-08-11 00:29:03,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824250.0, ans=0.1 2024-08-11 00:29:11,183 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 00:29:21,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=824450.0, ans=0.025 2024-08-11 00:29:28,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-11 00:29:34,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=824550.0, ans=0.0 2024-08-11 00:29:48,204 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10000, loss[loss=0.1225, beats_loss=0.01071, ecapa_loss=0.0002108, whisper_loss=0.1097, over 22125.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01157, ecapa_loss=0.0002208, whisper_loss=0.09446, over 3862226.55 frames. ], batch size: 87, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:29:57,462 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 00:30:09,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=824750.0, ans=22.5 2024-08-11 00:30:16,689 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 00:30:30,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824950.0, ans=0.1 2024-08-11 00:30:37,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.691e+01 3.032e+01 3.574e+01 5.004e+01, threshold=6.065e+01, percent-clipped=0.0 2024-08-11 00:30:39,226 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 00:30:48,385 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 00:30:56,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10050, loss[loss=0.1199, beats_loss=0.01288, ecapa_loss=0.0002287, whisper_loss=0.1048, over 22545.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01155, ecapa_loss=0.0002181, whisper_loss=0.09466, over 3867307.61 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 70368744177664.0 2024-08-11 00:30:57,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=825150.0, ans=0.1 2024-08-11 00:31:22,437 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 00:31:22,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=825250.0, ans=0.0 2024-08-11 00:31:27,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=825250.0, ans=0.05 2024-08-11 00:31:43,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=825350.0, ans=0.125 2024-08-11 00:31:43,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=825350.0, ans=0.0 2024-08-11 00:31:48,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=825350.0, ans=0.2 2024-08-11 00:32:35,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10100, loss[loss=0.1177, beats_loss=0.008165, ecapa_loss=0.0002868, whisper_loss=0.1067, over 15527.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01156, ecapa_loss=0.0002201, whisper_loss=0.09468, over 3887375.22 frames. ], batch size: 62, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:32:45,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=825650.0, ans=0.125 2024-08-11 00:32:54,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825650.0, ans=0.1 2024-08-11 00:32:55,544 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 00:33:03,796 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 00:33:06,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.52 vs. limit=22.5 2024-08-11 00:33:16,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=825750.0, ans=0.5 2024-08-11 00:33:18,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=825850.0, ans=0.125 2024-08-11 00:33:22,945 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.047e+00 2024-08-11 00:33:55,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.818e+01 3.128e+01 3.591e+01 5.480e+01, threshold=6.256e+01, percent-clipped=0.0 2024-08-11 00:34:00,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-11 00:34:02,603 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 00:34:05,631 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 00:34:05,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=826050.0, ans=0.0 2024-08-11 00:34:14,311 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 00:34:24,126 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 00:34:29,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=826050.0, ans=0.0 2024-08-11 00:34:34,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10150, loss[loss=0.1042, beats_loss=0.01287, ecapa_loss=0.0001887, whisper_loss=0.08943, over 17674.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01159, ecapa_loss=0.0002196, whisper_loss=0.09485, over 3877908.17 frames. ], batch size: 69, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:34:39,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826150.0, ans=0.1 2024-08-11 00:34:54,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=15.0 2024-08-11 00:35:09,505 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-11 00:35:14,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826250.0, ans=0.1 2024-08-11 00:35:27,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=826350.0, ans=0.125 2024-08-11 00:35:32,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-08-11 00:35:41,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-11 00:36:18,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.62 vs. limit=22.5 2024-08-11 00:36:37,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10200, loss[loss=0.08898, beats_loss=0.01104, ecapa_loss=0.000191, whisper_loss=0.07603, over 20017.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01161, ecapa_loss=0.0002202, whisper_loss=0.094, over 3867380.91 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:36:41,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=826650.0, ans=0.125 2024-08-11 00:36:54,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-08-11 00:36:58,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=826650.0, ans=0.09899494936611666 2024-08-11 00:37:03,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2024-08-11 00:37:04,763 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 00:37:35,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=826950.0, ans=0.0 2024-08-11 00:37:39,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=826950.0, ans=15.0 2024-08-11 00:37:40,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.717e+01 3.021e+01 3.434e+01 5.708e+01, threshold=6.043e+01, percent-clipped=0.0 2024-08-11 00:37:47,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=827050.0, ans=0.025 2024-08-11 00:38:03,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10250, loss[loss=0.114, beats_loss=0.01188, ecapa_loss=0.0002284, whisper_loss=0.09988, over 18233.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01155, ecapa_loss=0.0002196, whisper_loss=0.09483, over 3866588.64 frames. ], batch size: 73, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:38:04,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=827150.0, ans=0.0 2024-08-11 00:38:08,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=827150.0, ans=0.125 2024-08-11 00:38:16,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=827150.0, ans=0.125 2024-08-11 00:38:29,540 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 00:38:41,556 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-11 00:38:47,735 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 00:38:51,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=827450.0, ans=0.125 2024-08-11 00:38:55,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827450.0, ans=0.125 2024-08-11 00:39:02,641 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 00:39:19,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10300, loss[loss=0.1219, beats_loss=0.006696, ecapa_loss=0.0003077, whisper_loss=0.1121, over 17989.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01143, ecapa_loss=0.0002202, whisper_loss=0.09488, over 3875058.32 frames. ], batch size: 76, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:39:21,986 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.655e-03 2024-08-11 00:39:46,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=827750.0, ans=0.0 2024-08-11 00:39:53,768 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-11 00:40:08,974 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 00:40:13,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.682e+01 2.875e+01 3.472e+01 4.715e+01, threshold=5.749e+01, percent-clipped=0.0 2024-08-11 00:40:14,824 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 00:40:24,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=828050.0, ans=0.0 2024-08-11 00:40:27,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828050.0, ans=0.1 2024-08-11 00:40:36,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10350, loss[loss=0.1065, beats_loss=0.01023, ecapa_loss=0.0002045, whisper_loss=0.09425, over 16064.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01145, ecapa_loss=0.0002203, whisper_loss=0.095, over 3878842.16 frames. ], batch size: 61, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:40:36,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=828150.0, ans=0.2 2024-08-11 00:40:43,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=22.5 2024-08-11 00:40:47,862 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 00:40:53,689 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 00:41:02,577 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-11 00:41:04,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=828350.0, ans=0.125 2024-08-11 00:41:11,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=828350.0, ans=0.125 2024-08-11 00:41:44,739 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 00:41:48,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=828550.0, ans=0.125 2024-08-11 00:41:50,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=828550.0, ans=0.125 2024-08-11 00:41:54,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10400, loss[loss=0.09748, beats_loss=0.01362, ecapa_loss=0.0001859, whisper_loss=0.08199, over 22344.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01153, ecapa_loss=0.000219, whisper_loss=0.09477, over 3873104.79 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:42:16,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-08-11 00:42:27,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=828850.0, ans=0.0 2024-08-11 00:42:30,403 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 00:42:49,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.708e+01 2.999e+01 3.498e+01 5.568e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 00:42:59,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=829050.0, ans=0.0 2024-08-11 00:43:01,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=829050.0, ans=0.125 2024-08-11 00:43:14,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10450, loss[loss=0.1095, beats_loss=0.009049, ecapa_loss=0.0002083, whisper_loss=0.0984, over 23867.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0115, ecapa_loss=0.0002169, whisper_loss=0.09422, over 3862507.99 frames. ], batch size: 92, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:43:21,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=829150.0, ans=0.125 2024-08-11 00:43:23,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=829150.0, ans=0.0 2024-08-11 00:43:36,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=829250.0, ans=0.125 2024-08-11 00:43:36,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=829250.0, ans=0.0 2024-08-11 00:43:40,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=829250.0, ans=0.0 2024-08-11 00:43:58,968 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 00:44:04,642 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 00:44:21,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2024-08-11 00:44:27,870 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 00:44:35,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10500, loss[loss=0.09898, beats_loss=0.009583, ecapa_loss=0.0002504, whisper_loss=0.08689, over 13982.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01144, ecapa_loss=0.0002179, whisper_loss=0.09428, over 3857118.14 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:45:20,393 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 00:45:26,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=829950.0, ans=0.125 2024-08-11 00:45:27,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.730e+01 2.985e+01 3.287e+01 5.938e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 00:45:29,603 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:45:38,018 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 00:45:44,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=830050.0, ans=0.0 2024-08-11 00:45:49,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10550, loss[loss=0.0901, beats_loss=0.01384, ecapa_loss=0.000174, whisper_loss=0.07452, over 19749.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01148, ecapa_loss=0.0002178, whisper_loss=0.09402, over 3835539.88 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:46:04,410 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 00:46:17,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-08-11 00:46:43,538 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 00:47:04,161 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 00:47:07,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=12.0 2024-08-11 00:47:08,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10600, loss[loss=0.09359, beats_loss=0.01088, ecapa_loss=0.0002384, whisper_loss=0.08033, over 17615.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01148, ecapa_loss=0.0002178, whisper_loss=0.09428, over 3863807.91 frames. ], batch size: 73, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:47:14,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2024-08-11 00:47:22,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.54 vs. limit=5.0 2024-08-11 00:47:24,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=830750.0, ans=0.95 2024-08-11 00:47:43,747 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 00:48:00,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.647e+01 3.131e+01 3.600e+01 5.761e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 00:48:02,366 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 00:48:11,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2024-08-11 00:48:14,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-08-11 00:48:23,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10650, loss[loss=0.1087, beats_loss=0.008522, ecapa_loss=0.000233, whisper_loss=0.09789, over 15829.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01144, ecapa_loss=0.0002165, whisper_loss=0.0944, over 3874072.12 frames. ], batch size: 61, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:48:28,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-11 00:48:38,519 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 00:49:09,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=831450.0, ans=0.125 2024-08-11 00:49:17,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=831450.0, ans=0.035 2024-08-11 00:49:17,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=831450.0, ans=0.125 2024-08-11 00:49:23,644 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-11 00:49:25,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=831550.0, ans=0.125 2024-08-11 00:49:34,957 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 00:49:40,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10700, loss[loss=0.09588, beats_loss=0.009799, ecapa_loss=0.0001912, whisper_loss=0.08416, over 17601.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01136, ecapa_loss=0.0002159, whisper_loss=0.0958, over 3900165.21 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:49:40,503 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 00:49:43,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=831650.0, ans=0.035 2024-08-11 00:50:07,972 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.565e+00 2024-08-11 00:50:13,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=831850.0, ans=0.2 2024-08-11 00:50:18,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=831850.0, ans=0.0 2024-08-11 00:50:31,524 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.817e+01 3.065e+01 3.573e+01 8.621e+01, threshold=6.130e+01, percent-clipped=1.0 2024-08-11 00:50:53,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10750, loss[loss=0.1173, beats_loss=0.01008, ecapa_loss=0.0002645, whisper_loss=0.1046, over 20175.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.0114, ecapa_loss=0.0002163, whisper_loss=0.09575, over 3906113.21 frames. ], batch size: 83, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:51:01,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=832150.0, ans=0.2 2024-08-11 00:51:04,021 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-11 00:51:05,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=832150.0, ans=0.125 2024-08-11 00:51:11,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=832250.0, ans=0.125 2024-08-11 00:51:14,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=832250.0, ans=0.125 2024-08-11 00:51:42,297 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-11 00:51:42,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=832450.0, ans=0.025 2024-08-11 00:52:06,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=12.0 2024-08-11 00:52:11,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10800, loss[loss=0.105, beats_loss=0.0122, ecapa_loss=0.0002336, whisper_loss=0.09048, over 21580.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01151, ecapa_loss=0.0002166, whisper_loss=0.09561, over 3930736.28 frames. ], batch size: 87, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:52:11,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=832650.0, ans=0.05 2024-08-11 00:52:21,986 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.080e+00 2024-08-11 00:52:44,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=832850.0, ans=10.0 2024-08-11 00:53:04,608 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.723e+01 3.219e+01 3.827e+01 1.923e+02, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 00:53:05,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-11 00:53:18,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=833050.0, ans=0.125 2024-08-11 00:53:24,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.36 vs. limit=15.0 2024-08-11 00:53:26,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10850, loss[loss=0.09739, beats_loss=0.01321, ecapa_loss=0.0002339, whisper_loss=0.08184, over 20022.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01159, ecapa_loss=0.0002174, whisper_loss=0.09462, over 3917005.84 frames. ], batch size: 84, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:53:41,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=833250.0, ans=12.0 2024-08-11 00:53:45,584 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 00:53:53,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=833250.0, ans=0.05 2024-08-11 00:54:04,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2024-08-11 00:54:05,360 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 00:54:43,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10900, loss[loss=0.1215, beats_loss=0.01261, ecapa_loss=0.0002039, whisper_loss=0.1068, over 15369.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01156, ecapa_loss=0.0002177, whisper_loss=0.09535, over 3913882.99 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:54:43,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=833650.0, ans=0.0 2024-08-11 00:54:51,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=833650.0, ans=0.0 2024-08-11 00:55:30,107 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 00:55:31,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=833950.0, ans=0.125 2024-08-11 00:55:35,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.641e+01 2.975e+01 3.587e+01 5.714e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-11 00:55:38,976 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 00:55:43,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=834050.0, ans=0.0 2024-08-11 00:55:52,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834050.0, ans=0.1 2024-08-11 00:55:58,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 10950, loss[loss=0.1132, beats_loss=0.01143, ecapa_loss=0.0001936, whisper_loss=0.09979, over 23441.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01151, ecapa_loss=0.000218, whisper_loss=0.09511, over 3940779.10 frames. ], batch size: 93, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:56:14,677 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 00:56:17,128 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 00:56:27,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=834350.0, ans=0.0 2024-08-11 00:56:39,871 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 00:57:00,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=834550.0, ans=0.125 2024-08-11 00:57:13,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11000, loss[loss=0.116, beats_loss=0.01126, ecapa_loss=0.0002462, whisper_loss=0.1022, over 18801.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01143, ecapa_loss=0.0002189, whisper_loss=0.09533, over 3927775.76 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:57:14,519 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 00:57:14,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=834650.0, ans=0.125 2024-08-11 00:57:14,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=834650.0, ans=0.125 2024-08-11 00:57:28,810 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 00:58:06,899 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.341e+01 2.814e+01 3.042e+01 3.466e+01 5.998e+01, threshold=6.084e+01, percent-clipped=1.0 2024-08-11 00:58:30,968 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11050, loss[loss=0.08627, beats_loss=0.01355, ecapa_loss=0.0002502, whisper_loss=0.07022, over 18699.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01143, ecapa_loss=0.0002183, whisper_loss=0.09497, over 3928590.11 frames. ], batch size: 81, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:58:31,442 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.663e-01 2024-08-11 00:58:39,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=835150.0, ans=0.125 2024-08-11 00:58:39,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=835150.0, ans=0.0 2024-08-11 00:59:01,341 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 00:59:07,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=835350.0, ans=0.95 2024-08-11 00:59:08,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=835350.0, ans=0.1 2024-08-11 00:59:14,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=835350.0, ans=0.125 2024-08-11 00:59:42,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=835550.0, ans=0.125 2024-08-11 00:59:58,326 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11100, loss[loss=0.1337, beats_loss=0.008726, ecapa_loss=0.0002919, whisper_loss=0.122, over 21400.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01145, ecapa_loss=0.0002192, whisper_loss=0.09495, over 3931581.43 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 00:59:58,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.19 vs. limit=22.5 2024-08-11 01:00:04,770 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 01:00:19,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=835750.0, ans=0.125 2024-08-11 01:00:24,734 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 01:00:27,588 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 01:00:44,722 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-11 01:00:48,325 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 01:00:48,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835950.0, ans=0.125 2024-08-11 01:00:53,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.772e+01 3.086e+01 3.680e+01 7.620e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 01:01:19,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11150, loss[loss=0.1272, beats_loss=0.01226, ecapa_loss=0.000204, whisper_loss=0.1129, over 23128.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01141, ecapa_loss=0.0002184, whisper_loss=0.09584, over 3952998.08 frames. ], batch size: 93, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:01:24,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2024-08-11 01:01:41,858 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 01:01:57,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=836350.0, ans=0.125 2024-08-11 01:01:59,947 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 01:02:00,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836350.0, ans=0.1 2024-08-11 01:02:00,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=836350.0, ans=0.125 2024-08-11 01:02:22,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-08-11 01:02:25,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=836550.0, ans=0.0 2024-08-11 01:02:32,389 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 01:02:36,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11200, loss[loss=0.0979, beats_loss=0.01019, ecapa_loss=0.0002676, whisper_loss=0.08502, over 14829.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01138, ecapa_loss=0.0002182, whisper_loss=0.09602, over 3968598.29 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:02:38,138 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 01:03:29,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=15.0 2024-08-11 01:03:35,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.788e+01 3.078e+01 3.604e+01 6.278e+01, threshold=6.156e+01, percent-clipped=2.0 2024-08-11 01:03:54,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=837050.0, ans=0.125 2024-08-11 01:03:55,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=837050.0, ans=0.0 2024-08-11 01:04:00,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11250, loss[loss=0.0903, beats_loss=0.01352, ecapa_loss=0.0002514, whisper_loss=0.07426, over 18515.00 frames. ], tot_loss[loss=0.1093, beats_loss=0.01135, ecapa_loss=0.0002203, whisper_loss=0.09576, over 3951300.68 frames. ], batch size: 80, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:04:17,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=837250.0, ans=0.0 2024-08-11 01:04:33,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=837250.0, ans=0.0 2024-08-11 01:04:33,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=837250.0, ans=0.125 2024-08-11 01:05:08,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837550.0, ans=0.1 2024-08-11 01:05:13,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837550.0, ans=0.125 2024-08-11 01:05:15,893 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-11 01:05:17,651 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 01:05:19,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-11 01:05:25,291 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11300, loss[loss=0.1085, beats_loss=0.009295, ecapa_loss=0.0002468, whisper_loss=0.09677, over 22004.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0114, ecapa_loss=0.0002192, whisper_loss=0.09508, over 3933348.88 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:05:26,668 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 01:05:28,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=837650.0, ans=0.015 2024-08-11 01:05:53,184 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 01:05:53,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=837750.0, ans=0.0 2024-08-11 01:05:54,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=837750.0, ans=0.125 2024-08-11 01:06:04,684 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 01:06:05,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2024-08-11 01:06:07,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-08-11 01:06:21,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.704e+01 3.204e+01 3.789e+01 1.454e+02, threshold=6.408e+01, percent-clipped=1.0 2024-08-11 01:06:23,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-08-11 01:06:32,154 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 01:06:32,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-11 01:06:42,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838050.0, ans=0.1 2024-08-11 01:06:45,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=12.0 2024-08-11 01:06:45,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11350, loss[loss=0.1018, beats_loss=0.0118, ecapa_loss=0.000221, whisper_loss=0.08783, over 21277.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.0113, ecapa_loss=0.0002187, whisper_loss=0.09541, over 3923077.71 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:07:12,466 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 01:07:18,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=838350.0, ans=0.0 2024-08-11 01:07:22,691 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 01:07:22,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=838350.0, ans=0.125 2024-08-11 01:07:22,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=838350.0, ans=0.1 2024-08-11 01:07:25,367 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 01:07:27,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-11 01:07:46,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=838550.0, ans=0.025 2024-08-11 01:07:53,025 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 01:07:56,690 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 01:08:03,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11400, loss[loss=0.1105, beats_loss=0.01359, ecapa_loss=0.0002855, whisper_loss=0.09401, over 16395.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01128, ecapa_loss=0.0002197, whisper_loss=0.09562, over 3890293.97 frames. ], batch size: 72, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:08:03,514 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-11 01:08:22,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2024-08-11 01:08:42,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.59 vs. limit=10.0 2024-08-11 01:08:56,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2024-08-11 01:08:58,498 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.886e+01 3.314e+01 4.166e+01 1.030e+02, threshold=6.628e+01, percent-clipped=1.0 2024-08-11 01:09:13,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=839050.0, ans=0.125 2024-08-11 01:09:20,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11450, loss[loss=0.1194, beats_loss=0.01141, ecapa_loss=0.0001978, whisper_loss=0.106, over 22274.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01127, ecapa_loss=0.000219, whisper_loss=0.09615, over 3874987.85 frames. ], batch size: 85, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:09:34,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2024-08-11 01:09:37,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=839250.0, ans=0.125 2024-08-11 01:09:38,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=839250.0, ans=0.125 2024-08-11 01:09:57,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=839350.0, ans=0.125 2024-08-11 01:10:13,165 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 01:10:34,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=839550.0, ans=0.0 2024-08-11 01:10:42,213 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-11 01:10:44,077 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11500, loss[loss=0.09199, beats_loss=0.01145, ecapa_loss=0.0002293, whisper_loss=0.07825, over 22371.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.0114, ecapa_loss=0.0002158, whisper_loss=0.09518, over 3872402.59 frames. ], batch size: 91, lr: 1.02e-02, grad_scale: 70368744177664.0 2024-08-11 01:11:00,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839750.0, ans=0.1 2024-08-11 01:11:23,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=839850.0, ans=0.1 2024-08-11 01:11:26,576 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 01:11:28,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=839850.0, ans=0.125 2024-08-11 01:11:29,993 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:11:30,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=839850.0, ans=0.0 2024-08-11 01:11:31,378 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 01:11:39,737 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-84000.pt 2024-08-11 01:11:43,270 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.719e+01 3.134e+01 3.590e+01 4.797e+01, threshold=6.268e+01, percent-clipped=0.0 2024-08-11 01:11:51,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=840050.0, ans=0.125 2024-08-11 01:11:53,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=840050.0, ans=0.2 2024-08-11 01:12:03,283 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 01:12:06,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11550, loss[loss=0.1272, beats_loss=0.008127, ecapa_loss=0.0002379, whisper_loss=0.1167, over 19106.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01135, ecapa_loss=0.0002186, whisper_loss=0.09508, over 3870178.47 frames. ], batch size: 74, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:12:09,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=840150.0, ans=0.125 2024-08-11 01:12:13,104 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 01:12:23,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=840250.0, ans=0.125 2024-08-11 01:12:24,872 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 01:12:34,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840250.0, ans=0.1 2024-08-11 01:12:41,124 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 01:12:42,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=840350.0, ans=0.0 2024-08-11 01:12:52,899 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 01:12:59,360 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 01:13:07,471 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 01:13:21,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2024-08-11 01:13:22,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840550.0, ans=0.1 2024-08-11 01:13:27,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11600, loss[loss=0.1019, beats_loss=0.01299, ecapa_loss=0.0002342, whisper_loss=0.08657, over 22289.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01145, ecapa_loss=0.000219, whisper_loss=0.09488, over 3893576.91 frames. ], batch size: 93, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:13:29,382 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 01:13:34,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=840650.0, ans=0.2 2024-08-11 01:13:35,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=840650.0, ans=0.125 2024-08-11 01:13:59,287 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 01:14:01,744 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 01:14:03,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840850.0, ans=0.1 2024-08-11 01:14:04,813 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 01:14:11,048 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 29 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 01:14:13,197 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.215e+02 2024-08-11 01:14:16,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=840950.0, ans=0.125 2024-08-11 01:14:20,659 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 01:14:23,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+01 2.786e+01 3.126e+01 3.591e+01 6.008e+01, threshold=6.251e+01, percent-clipped=0.0 2024-08-11 01:14:46,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11650, loss[loss=0.102, beats_loss=0.009837, ecapa_loss=0.0002005, whisper_loss=0.09013, over 19788.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01143, ecapa_loss=0.000219, whisper_loss=0.09532, over 3892538.18 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:15:17,172 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 01:15:28,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=841350.0, ans=0.1 2024-08-11 01:15:43,890 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 01:15:44,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=841450.0, ans=0.125 2024-08-11 01:15:57,204 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-11 01:16:05,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11700, loss[loss=0.1293, beats_loss=0.0109, ecapa_loss=0.0002356, whisper_loss=0.1161, over 22600.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01153, ecapa_loss=0.0002182, whisper_loss=0.09487, over 3916157.46 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 140737488355328.0 2024-08-11 01:16:25,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2024-08-11 01:16:27,359 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 01:16:35,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-08-11 01:16:55,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=841950.0, ans=0.1 2024-08-11 01:16:59,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.883e+01 3.187e+01 3.882e+01 5.856e+01, threshold=6.374e+01, percent-clipped=0.0 2024-08-11 01:17:15,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=842050.0, ans=0.125 2024-08-11 01:17:23,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11750, loss[loss=0.1042, beats_loss=0.009207, ecapa_loss=0.0002983, whisper_loss=0.09199, over 12140.00 frames. ], tot_loss[loss=0.109, beats_loss=0.01148, ecapa_loss=0.0002175, whisper_loss=0.09535, over 3928660.98 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:17:43,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=842250.0, ans=0.125 2024-08-11 01:17:46,478 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 01:17:59,558 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 01:17:59,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=842350.0, ans=0.0 2024-08-11 01:18:24,469 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 01:18:40,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11800, loss[loss=0.09917, beats_loss=0.0145, ecapa_loss=0.0001416, whisper_loss=0.08325, over 22611.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01161, ecapa_loss=0.0002162, whisper_loss=0.09489, over 3939867.47 frames. ], batch size: 86, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:18:44,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=842650.0, ans=0.125 2024-08-11 01:18:56,973 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 01:19:05,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=12.0 2024-08-11 01:19:09,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=842750.0, ans=0.015 2024-08-11 01:19:09,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=842750.0, ans=0.0 2024-08-11 01:19:38,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+01 2.831e+01 3.248e+01 3.772e+01 8.461e+01, threshold=6.495e+01, percent-clipped=3.0 2024-08-11 01:19:53,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=843050.0, ans=0.125 2024-08-11 01:19:53,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=843050.0, ans=0.125 2024-08-11 01:19:58,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2024-08-11 01:20:03,174 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11850, loss[loss=0.1304, beats_loss=0.008188, ecapa_loss=0.0002808, whisper_loss=0.1194, over 21621.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01154, ecapa_loss=0.0002176, whisper_loss=0.09542, over 3936565.26 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:20:40,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=843350.0, ans=0.125 2024-08-11 01:20:42,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-11 01:20:59,144 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 01:21:00,645 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 01:21:08,284 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 01:21:10,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=843550.0, ans=0.125 2024-08-11 01:21:16,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2024-08-11 01:21:20,980 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11900, loss[loss=0.1297, beats_loss=0.009111, ecapa_loss=0.0002658, whisper_loss=0.1179, over 17658.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01157, ecapa_loss=0.0002187, whisper_loss=0.09538, over 3925035.87 frames. ], batch size: 71, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:21:22,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=843650.0, ans=0.0 2024-08-11 01:21:25,496 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 01:21:36,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=843750.0, ans=0.125 2024-08-11 01:21:50,085 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 01:21:52,919 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 01:21:59,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=843850.0, ans=0.0 2024-08-11 01:22:00,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=843850.0, ans=0.125 2024-08-11 01:22:13,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.860e+01 3.257e+01 3.543e+01 6.146e+01, threshold=6.513e+01, percent-clipped=0.0 2024-08-11 01:22:33,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2024-08-11 01:22:34,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 11950, loss[loss=0.1221, beats_loss=0.01043, ecapa_loss=0.0002131, whisper_loss=0.1096, over 22448.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01151, ecapa_loss=0.0002186, whisper_loss=0.09502, over 3912149.64 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:22:35,167 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 01:22:37,454 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-11 01:22:41,022 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 01:22:54,826 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 01:23:06,069 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 01:23:09,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=844350.0, ans=0.2 2024-08-11 01:23:11,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=12.0 2024-08-11 01:23:17,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2024-08-11 01:23:20,934 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 01:23:27,571 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 01:23:38,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=844550.0, ans=0.125 2024-08-11 01:23:46,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=844550.0, ans=0.09899494936611666 2024-08-11 01:23:47,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=844550.0, ans=0.125 2024-08-11 01:23:53,699 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12000, loss[loss=0.1206, beats_loss=0.01079, ecapa_loss=0.0002053, whisper_loss=0.1078, over 16479.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01149, ecapa_loss=0.0002164, whisper_loss=0.09511, over 3914048.49 frames. ], batch size: 63, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:23:53,701 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 01:24:32,709 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on ASR_libri: loss=0.2603, beats_loss=0, ecapa_loss=0.0006879, whisper_loss=0.2534, over 922467.00 frames. 2024-08-11 01:24:50,866 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on SV_voxceleb1: loss=0.005764, beats_loss=0, ecapa_loss=0.0005764, whisper_loss=0, over 939242.00 frames. 2024-08-11 01:26:40,322 INFO [train_multi_KD3.py:1149] (0/4) Epoch 6, validation on AT_audioset: loss=0.02599, beats_loss=0.02599, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 01:26:40,326 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 01:27:01,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=844750.0, ans=0.125 2024-08-11 01:27:04,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=844750.0, ans=0.0 2024-08-11 01:27:19,408 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 01:27:24,074 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 01:27:25,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=844850.0, ans=0.1 2024-08-11 01:27:35,907 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.989e+01 3.252e+01 3.842e+01 6.267e+01, threshold=6.505e+01, percent-clipped=0.0 2024-08-11 01:27:59,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2024-08-11 01:28:00,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12050, loss[loss=0.1137, beats_loss=0.01274, ecapa_loss=0.0001701, whisper_loss=0.09925, over 23783.00 frames. ], tot_loss[loss=0.1089, beats_loss=0.01155, ecapa_loss=0.0002145, whisper_loss=0.09521, over 3933091.89 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:28:13,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=845150.0, ans=0.0 2024-08-11 01:28:16,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=845250.0, ans=0.2 2024-08-11 01:28:32,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=845350.0, ans=10.0 2024-08-11 01:28:35,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=845350.0, ans=0.125 2024-08-11 01:28:54,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-08-11 01:29:10,503 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 01:29:17,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12100, loss[loss=0.1, beats_loss=0.01184, ecapa_loss=0.0001666, whisper_loss=0.08653, over 16442.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01157, ecapa_loss=0.0002166, whisper_loss=0.0944, over 3889957.44 frames. ], batch size: 61, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:29:17,536 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 01:29:20,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=845650.0, ans=0.125 2024-08-11 01:29:37,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=12.0 2024-08-11 01:29:49,783 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 18 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 01:29:51,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=845850.0, ans=0.0 2024-08-11 01:30:01,358 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 01:30:10,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.614e+01 2.881e+01 3.224e+01 5.170e+01, threshold=5.763e+01, percent-clipped=0.0 2024-08-11 01:30:20,365 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 01:30:32,829 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12150, loss[loss=0.1064, beats_loss=0.0127, ecapa_loss=0.0001899, whisper_loss=0.09177, over 21155.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01153, ecapa_loss=0.0002168, whisper_loss=0.0944, over 3890105.06 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:30:36,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846150.0, ans=0.1 2024-08-11 01:30:50,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=15.0 2024-08-11 01:30:59,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2024-08-11 01:31:04,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=846350.0, ans=0.0 2024-08-11 01:31:06,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-08-11 01:31:22,698 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 01:31:37,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=846550.0, ans=0.125 2024-08-11 01:31:49,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12200, loss[loss=0.1058, beats_loss=0.00991, ecapa_loss=0.000274, whisper_loss=0.0932, over 13353.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01151, ecapa_loss=0.0002169, whisper_loss=0.09469, over 3880251.22 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:31:51,398 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 01:31:56,232 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 01:32:02,768 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 01:32:07,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=846750.0, ans=0.0 2024-08-11 01:32:08,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=846750.0, ans=0.125 2024-08-11 01:32:08,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=846750.0, ans=0.125 2024-08-11 01:32:10,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=846750.0, ans=0.125 2024-08-11 01:32:11,633 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 01:32:21,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=846850.0, ans=0.0 2024-08-11 01:32:25,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=846850.0, ans=0.125 2024-08-11 01:32:27,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-08-11 01:32:36,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=846950.0, ans=0.125 2024-08-11 01:32:43,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.785e+01 3.124e+01 3.706e+01 5.181e+01, threshold=6.248e+01, percent-clipped=0.0 2024-08-11 01:33:08,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12250, loss[loss=0.1142, beats_loss=0.01283, ecapa_loss=0.0001982, whisper_loss=0.09938, over 22504.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01144, ecapa_loss=0.0002171, whisper_loss=0.09492, over 3857264.27 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:34:03,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=847450.0, ans=0.0 2024-08-11 01:34:08,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847450.0, ans=0.1 2024-08-11 01:34:23,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=847550.0, ans=0.125 2024-08-11 01:34:28,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12300, loss[loss=0.1114, beats_loss=0.01236, ecapa_loss=0.000198, whisper_loss=0.09703, over 18680.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.0115, ecapa_loss=0.0002176, whisper_loss=0.09492, over 3866526.15 frames. ], batch size: 73, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:34:36,567 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 01:34:41,089 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.451e-02 2024-08-11 01:34:51,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=847750.0, ans=0.125 2024-08-11 01:35:11,357 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 01:35:18,999 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 01:35:24,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.835e+01 3.125e+01 3.646e+01 6.261e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 01:35:43,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=848050.0, ans=0.09899494936611666 2024-08-11 01:35:47,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12350, loss[loss=0.104, beats_loss=0.01237, ecapa_loss=0.0002395, whisper_loss=0.08919, over 22079.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01145, ecapa_loss=0.0002189, whisper_loss=0.0947, over 3882172.13 frames. ], batch size: 93, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:35:50,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2024-08-11 01:35:51,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=848150.0, ans=0.0 2024-08-11 01:36:14,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=848250.0, ans=0.125 2024-08-11 01:36:19,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848350.0, ans=0.1 2024-08-11 01:36:23,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848350.0, ans=0.1 2024-08-11 01:36:35,823 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 01:36:37,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=848450.0, ans=0.2 2024-08-11 01:36:42,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-08-11 01:36:50,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=848550.0, ans=0.2 2024-08-11 01:36:52,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=848550.0, ans=0.125 2024-08-11 01:36:56,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=848550.0, ans=0.125 2024-08-11 01:37:02,363 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12400, loss[loss=0.09071, beats_loss=0.01252, ecapa_loss=0.0002226, whisper_loss=0.07597, over 22338.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.0114, ecapa_loss=0.0002203, whisper_loss=0.0947, over 3902695.16 frames. ], batch size: 90, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:37:20,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=848750.0, ans=0.0 2024-08-11 01:37:32,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=848850.0, ans=0.125 2024-08-11 01:37:37,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=848850.0, ans=0.0 2024-08-11 01:37:49,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=848950.0, ans=0.1 2024-08-11 01:37:52,083 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 12 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 01:37:52,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=848950.0, ans=0.07 2024-08-11 01:37:54,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.646e+01 2.993e+01 3.533e+01 4.877e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 01:37:57,796 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 01:38:00,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2024-08-11 01:38:02,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=12.0 2024-08-11 01:38:08,328 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 01:38:14,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=849050.0, ans=0.125 2024-08-11 01:38:17,169 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12450, loss[loss=0.1074, beats_loss=0.01133, ecapa_loss=0.0002091, whisper_loss=0.09399, over 23077.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002189, whisper_loss=0.09368, over 3901962.59 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:38:17,314 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 38 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 01:38:40,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=849250.0, ans=0.125 2024-08-11 01:38:44,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2024-08-11 01:38:48,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=849350.0, ans=0.07 2024-08-11 01:39:07,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-08-11 01:39:15,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=849450.0, ans=0.125 2024-08-11 01:39:18,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=849550.0, ans=0.5 2024-08-11 01:39:22,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=849550.0, ans=0.125 2024-08-11 01:39:23,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=849550.0, ans=0.125 2024-08-11 01:39:25,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=849550.0, ans=0.1 2024-08-11 01:39:28,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=849550.0, ans=0.125 2024-08-11 01:39:32,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12500, loss[loss=0.1097, beats_loss=0.01138, ecapa_loss=0.0002178, whisper_loss=0.09612, over 20073.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01144, ecapa_loss=0.0002179, whisper_loss=0.094, over 3890521.43 frames. ], batch size: 83, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:39:37,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=849650.0, ans=0.04949747468305833 2024-08-11 01:39:57,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=849750.0, ans=0.125 2024-08-11 01:39:58,926 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:40:24,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=849950.0, ans=0.2 2024-08-11 01:40:27,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=849950.0, ans=0.2 2024-08-11 01:40:28,764 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.816e+01 3.133e+01 3.791e+01 6.148e+01, threshold=6.266e+01, percent-clipped=1.0 2024-08-11 01:40:40,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=850050.0, ans=0.125 2024-08-11 01:40:51,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12550, loss[loss=0.1165, beats_loss=0.01128, ecapa_loss=0.000229, whisper_loss=0.1029, over 22552.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01148, ecapa_loss=0.0002176, whisper_loss=0.09367, over 3894378.04 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:41:13,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=850250.0, ans=0.125 2024-08-11 01:41:19,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=850250.0, ans=0.0 2024-08-11 01:41:45,020 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 01:42:01,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-11 01:42:04,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=850550.0, ans=0.0 2024-08-11 01:42:10,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12600, loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0002341, whisper_loss=0.08983, over 16387.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01149, ecapa_loss=0.0002184, whisper_loss=0.09392, over 3869152.40 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:43:05,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=850950.0, ans=0.0 2024-08-11 01:43:06,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+01 2.981e+01 3.398e+01 4.026e+01 7.168e+01, threshold=6.796e+01, percent-clipped=1.0 2024-08-11 01:43:20,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=851050.0, ans=0.125 2024-08-11 01:43:27,086 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.907e+00 2024-08-11 01:43:29,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12650, loss[loss=0.09922, beats_loss=0.01582, ecapa_loss=0.0001743, whisper_loss=0.08166, over 15825.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01158, ecapa_loss=0.0002189, whisper_loss=0.09342, over 3854234.39 frames. ], batch size: 65, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:43:48,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=851250.0, ans=0.125 2024-08-11 01:43:49,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=851250.0, ans=0.125 2024-08-11 01:43:54,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=851250.0, ans=0.125 2024-08-11 01:44:00,763 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 01:44:33,090 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 01:44:45,585 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 01:44:48,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12700, loss[loss=0.09542, beats_loss=0.01207, ecapa_loss=0.000221, whisper_loss=0.08114, over 16708.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01167, ecapa_loss=0.0002171, whisper_loss=0.09362, over 3860578.29 frames. ], batch size: 68, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:45:00,785 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 01:45:08,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=851750.0, ans=0.125 2024-08-11 01:45:08,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=851750.0, ans=0.125 2024-08-11 01:45:27,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=851850.0, ans=0.07 2024-08-11 01:45:29,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=851850.0, ans=0.2 2024-08-11 01:45:40,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.734e+01 2.989e+01 3.425e+01 5.621e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-11 01:45:40,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=851950.0, ans=0.0 2024-08-11 01:45:45,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=851950.0, ans=0.125 2024-08-11 01:45:59,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852050.0, ans=0.1 2024-08-11 01:46:04,389 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12750, loss[loss=0.1047, beats_loss=0.01131, ecapa_loss=0.000255, whisper_loss=0.09087, over 17835.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.0117, ecapa_loss=0.0002176, whisper_loss=0.09377, over 3890470.10 frames. ], batch size: 72, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:46:14,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=852150.0, ans=0.125 2024-08-11 01:46:15,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852150.0, ans=0.0 2024-08-11 01:46:20,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852250.0, ans=0.1 2024-08-11 01:46:23,389 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 01:46:35,569 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 01:46:37,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-11 01:46:42,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=852350.0, ans=0.0 2024-08-11 01:46:55,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852450.0, ans=0.1 2024-08-11 01:47:10,797 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 01:47:12,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=852550.0, ans=0.0 2024-08-11 01:47:19,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12800, loss[loss=0.06918, beats_loss=0.0139, ecapa_loss=0.0001666, whisper_loss=0.05361, over 13274.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01172, ecapa_loss=0.0002177, whisper_loss=0.09334, over 3849353.59 frames. ], batch size: 53, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:47:23,594 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 01:47:28,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=852650.0, ans=0.125 2024-08-11 01:47:43,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=852750.0, ans=0.125 2024-08-11 01:47:48,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852850.0, ans=0.1 2024-08-11 01:47:51,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=852850.0, ans=0.0 2024-08-11 01:47:55,841 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 01:48:00,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=852850.0, ans=0.125 2024-08-11 01:48:01,390 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 01:48:09,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.804e+01 3.213e+01 3.707e+01 6.106e+01, threshold=6.425e+01, percent-clipped=1.0 2024-08-11 01:48:17,728 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 01:48:20,879 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 01:48:29,343 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 01:48:30,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12850, loss[loss=0.09507, beats_loss=0.01281, ecapa_loss=0.0002069, whisper_loss=0.08019, over 22284.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01171, ecapa_loss=0.000217, whisper_loss=0.0932, over 3854844.27 frames. ], batch size: 93, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:48:34,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=853150.0, ans=0.125 2024-08-11 01:48:59,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=853350.0, ans=0.125 2024-08-11 01:49:05,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=853350.0, ans=0.125 2024-08-11 01:49:06,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2024-08-11 01:49:30,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=22.5 2024-08-11 01:49:38,824 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 01:49:41,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12900, loss[loss=0.09881, beats_loss=0.01255, ecapa_loss=0.0001892, whisper_loss=0.08436, over 23293.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01174, ecapa_loss=0.0002171, whisper_loss=0.09245, over 3867430.08 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:49:42,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2024-08-11 01:49:52,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=853650.0, ans=0.125 2024-08-11 01:50:21,304 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 01:50:29,747 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 01:50:32,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.643e+01 2.887e+01 3.353e+01 5.409e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 01:50:55,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 12950, loss[loss=0.1091, beats_loss=0.009089, ecapa_loss=0.0002691, whisper_loss=0.09736, over 21685.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01169, ecapa_loss=0.0002159, whisper_loss=0.09262, over 3841857.70 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:51:09,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-08-11 01:51:18,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=854250.0, ans=0.125 2024-08-11 01:51:34,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=854350.0, ans=0.125 2024-08-11 01:51:36,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2024-08-11 01:51:55,184 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 01:51:58,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=854550.0, ans=0.2 2024-08-11 01:52:03,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-11 01:52:04,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=854550.0, ans=15.0 2024-08-11 01:52:10,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854650.0, ans=0.1 2024-08-11 01:52:11,875 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13000, loss[loss=0.1225, beats_loss=0.009713, ecapa_loss=0.0001975, whisper_loss=0.1108, over 19797.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01164, ecapa_loss=0.000216, whisper_loss=0.09325, over 3871601.76 frames. ], batch size: 74, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:52:16,873 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 01:52:20,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=854650.0, ans=0.0 2024-08-11 01:52:32,675 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 01:52:51,518 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 01:52:58,772 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 01:53:03,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854950.0, ans=0.125 2024-08-11 01:53:06,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.697e+01 3.039e+01 3.659e+01 7.134e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-11 01:53:13,744 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 01:53:18,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=855050.0, ans=0.0 2024-08-11 01:53:29,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13050, loss[loss=0.1341, beats_loss=0.01198, ecapa_loss=0.0002447, whisper_loss=0.1197, over 23505.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01164, ecapa_loss=0.000218, whisper_loss=0.09335, over 3870455.37 frames. ], batch size: 94, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:53:31,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=855150.0, ans=0.125 2024-08-11 01:53:34,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=855150.0, ans=0.2 2024-08-11 01:54:01,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=855350.0, ans=0.1 2024-08-11 01:54:02,915 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 01:54:09,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=855350.0, ans=0.2 2024-08-11 01:54:10,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=855350.0, ans=0.2 2024-08-11 01:54:21,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=855450.0, ans=0.125 2024-08-11 01:54:24,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=855450.0, ans=0.5 2024-08-11 01:54:28,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=855450.0, ans=0.0 2024-08-11 01:54:47,749 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13100, loss[loss=0.1448, beats_loss=0.008559, ecapa_loss=0.0002609, whisper_loss=0.1336, over 21309.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01159, ecapa_loss=0.0002166, whisper_loss=0.0946, over 3916917.96 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:54:53,057 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 01:55:07,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-11 01:55:10,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=855750.0, ans=0.0 2024-08-11 01:55:24,989 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 01:55:33,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=855950.0, ans=0.125 2024-08-11 01:55:44,246 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.870e+01 3.154e+01 3.850e+01 5.715e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 01:55:57,112 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.665e+05 2024-08-11 01:55:58,297 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 01:56:05,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=856050.0, ans=0.0 2024-08-11 01:56:08,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13150, loss[loss=0.1225, beats_loss=0.009785, ecapa_loss=0.0002084, whisper_loss=0.1106, over 23848.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01153, ecapa_loss=0.0002158, whisper_loss=0.09502, over 3895724.02 frames. ], batch size: 92, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:56:13,664 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 01:56:21,662 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 01:56:28,169 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 01:56:32,894 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 01:56:42,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=856350.0, ans=0.2 2024-08-11 01:57:04,147 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 01:57:06,115 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 01:57:06,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856450.0, ans=0.1 2024-08-11 01:57:10,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=856550.0, ans=0.1 2024-08-11 01:57:14,839 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 01:57:25,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13200, loss[loss=0.1087, beats_loss=0.01142, ecapa_loss=0.0002335, whisper_loss=0.09494, over 17134.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01149, ecapa_loss=0.0002159, whisper_loss=0.09514, over 3893844.28 frames. ], batch size: 68, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:57:31,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-08-11 01:57:36,537 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 25 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-11 01:58:08,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=856950.0, ans=0.5 2024-08-11 01:58:15,907 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-11 01:58:17,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.805e+01 3.191e+01 3.827e+01 5.209e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 01:58:17,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=856950.0, ans=0.0 2024-08-11 01:58:19,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=856950.0, ans=0.125 2024-08-11 01:58:27,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=857050.0, ans=0.0 2024-08-11 01:58:30,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=857050.0, ans=0.125 2024-08-11 01:58:38,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13250, loss[loss=0.08928, beats_loss=0.01307, ecapa_loss=0.0001946, whisper_loss=0.07427, over 17332.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01161, ecapa_loss=0.0002157, whisper_loss=0.09373, over 3855304.08 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:58:51,238 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 01:59:06,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.97 vs. limit=8.0 2024-08-11 01:59:06,618 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 01:59:19,681 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 01:59:20,616 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 01:59:27,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=857450.0, ans=0.125 2024-08-11 01:59:35,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=857550.0, ans=0.125 2024-08-11 01:59:42,359 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 01:59:47,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=857650.0, ans=0.0 2024-08-11 01:59:49,412 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13300, loss[loss=0.08667, beats_loss=0.01467, ecapa_loss=0.0001808, whisper_loss=0.07019, over 19024.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01165, ecapa_loss=0.000214, whisper_loss=0.09336, over 3832869.85 frames. ], batch size: 77, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 01:59:58,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857650.0, ans=0.1 2024-08-11 02:00:00,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=857650.0, ans=15.0 2024-08-11 02:00:03,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=857750.0, ans=0.125 2024-08-11 02:00:15,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=857750.0, ans=0.1 2024-08-11 02:00:16,442 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 02:00:27,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=857850.0, ans=0.0 2024-08-11 02:00:37,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.722e+01 2.995e+01 3.352e+01 6.535e+01, threshold=5.989e+01, percent-clipped=1.0 2024-08-11 02:00:38,968 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 02:00:46,021 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-11 02:00:57,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13350, loss[loss=0.1288, beats_loss=0.009721, ecapa_loss=0.0001969, whisper_loss=0.1171, over 19268.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0116, ecapa_loss=0.0002137, whisper_loss=0.09414, over 3846830.24 frames. ], batch size: 73, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:01:07,801 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 02:01:44,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858450.0, ans=0.1 2024-08-11 02:01:48,296 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 02:01:49,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=858450.0, ans=0.0 2024-08-11 02:02:00,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.62 vs. limit=22.5 2024-08-11 02:02:04,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13400, loss[loss=0.0898, beats_loss=0.01579, ecapa_loss=0.0002005, whisper_loss=0.072, over 15882.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01161, ecapa_loss=0.0002132, whisper_loss=0.09414, over 3811851.95 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 140737488355328.0 2024-08-11 02:02:22,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=858750.0, ans=0.0 2024-08-11 02:02:35,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-08-11 02:02:43,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=858950.0, ans=0.0 2024-08-11 02:02:44,855 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 02:02:46,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=858950.0, ans=0.125 2024-08-11 02:02:51,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.837e+01 3.208e+01 3.826e+01 8.458e+01, threshold=6.417e+01, percent-clipped=4.0 2024-08-11 02:02:54,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=858950.0, ans=0.09899494936611666 2024-08-11 02:02:54,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=858950.0, ans=0.125 2024-08-11 02:03:11,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13450, loss[loss=0.09695, beats_loss=0.0118, ecapa_loss=0.0002517, whisper_loss=0.08263, over 15958.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01163, ecapa_loss=0.000215, whisper_loss=0.09434, over 3850624.93 frames. ], batch size: 66, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:03:31,187 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-11 02:03:33,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2024-08-11 02:04:07,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=859550.0, ans=0.0 2024-08-11 02:04:16,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=859650.0, ans=0.0 2024-08-11 02:04:18,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13500, loss[loss=0.0924, beats_loss=0.01382, ecapa_loss=0.0002004, whisper_loss=0.07657, over 21953.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0117, ecapa_loss=0.0002146, whisper_loss=0.09355, over 3866760.08 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 140737488355328.0 2024-08-11 02:04:38,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=859750.0, ans=0.0 2024-08-11 02:04:54,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=859850.0, ans=0.125 2024-08-11 02:05:00,944 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 32 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 02:05:04,815 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.793e+01 3.249e+01 3.860e+01 6.225e+01, threshold=6.498e+01, percent-clipped=0.0 2024-08-11 02:05:11,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=860050.0, ans=0.025 2024-08-11 02:05:14,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=860050.0, ans=0.0 2024-08-11 02:05:14,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-11 02:05:24,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13550, loss[loss=0.1199, beats_loss=0.01159, ecapa_loss=0.0002172, whisper_loss=0.1061, over 14568.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01164, ecapa_loss=0.0002159, whisper_loss=0.09432, over 3862372.97 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:05:27,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.18 vs. limit=15.0 2024-08-11 02:05:37,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=860250.0, ans=0.125 2024-08-11 02:05:52,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860350.0, ans=0.1 2024-08-11 02:05:56,736 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 02:06:11,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=860450.0, ans=0.0 2024-08-11 02:06:16,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=860450.0, ans=0.025 2024-08-11 02:06:30,182 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 02:06:34,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13600, loss[loss=0.08778, beats_loss=0.013, ecapa_loss=0.0002685, whisper_loss=0.07209, over 21240.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01162, ecapa_loss=0.0002166, whisper_loss=0.09446, over 3877235.30 frames. ], batch size: 95, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:06:52,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=860750.0, ans=0.2 2024-08-11 02:06:57,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=860750.0, ans=0.125 2024-08-11 02:07:04,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=860850.0, ans=0.025 2024-08-11 02:07:21,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2024-08-11 02:07:23,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.998e+01 3.369e+01 4.005e+01 6.707e+01, threshold=6.738e+01, percent-clipped=1.0 2024-08-11 02:07:23,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=860950.0, ans=0.2 2024-08-11 02:07:32,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=861050.0, ans=0.05 2024-08-11 02:07:36,306 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 02:07:38,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.56 vs. limit=22.5 2024-08-11 02:07:44,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13650, loss[loss=0.107, beats_loss=0.01247, ecapa_loss=0.0002229, whisper_loss=0.09226, over 20053.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01165, ecapa_loss=0.0002162, whisper_loss=0.0943, over 3874622.62 frames. ], batch size: 82, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:07:47,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=861150.0, ans=0.125 2024-08-11 02:07:52,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=861150.0, ans=0.125 2024-08-11 02:07:56,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=861250.0, ans=0.125 2024-08-11 02:08:02,579 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 02:08:05,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=861250.0, ans=0.125 2024-08-11 02:08:16,247 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 02:08:28,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-08-11 02:08:37,294 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 02:08:38,686 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 02:08:42,938 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 02:08:50,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=861550.0, ans=0.125 2024-08-11 02:08:54,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13700, loss[loss=0.1138, beats_loss=0.01113, ecapa_loss=0.0002573, whisper_loss=0.1001, over 17812.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01166, ecapa_loss=0.0002163, whisper_loss=0.09464, over 3881168.72 frames. ], batch size: 75, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:08:56,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=861650.0, ans=0.2 2024-08-11 02:09:01,384 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 02:09:04,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=861650.0, ans=0.0 2024-08-11 02:09:34,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=861850.0, ans=0.125 2024-08-11 02:09:35,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-08-11 02:09:37,487 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 02:09:44,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.742e+01 3.072e+01 3.573e+01 1.415e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 02:09:52,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862050.0, ans=0.1 2024-08-11 02:09:54,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=862050.0, ans=0.0 2024-08-11 02:10:01,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=862050.0, ans=0.0 2024-08-11 02:10:05,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13750, loss[loss=0.1132, beats_loss=0.01204, ecapa_loss=0.0002079, whisper_loss=0.09911, over 18343.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01156, ecapa_loss=0.0002163, whisper_loss=0.09485, over 3862366.25 frames. ], batch size: 72, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:10:08,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862150.0, ans=0.1 2024-08-11 02:10:36,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=862350.0, ans=0.125 2024-08-11 02:10:41,269 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:10:49,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=862450.0, ans=0.125 2024-08-11 02:11:05,891 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 26 from Vox, 18 fro AS 2024-08-11 02:11:14,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13800, loss[loss=0.1127, beats_loss=0.01273, ecapa_loss=0.0002417, whisper_loss=0.09754, over 19197.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01146, ecapa_loss=0.0002169, whisper_loss=0.09483, over 3840841.89 frames. ], batch size: 78, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:11:18,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2024-08-11 02:11:31,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=862750.0, ans=0.05 2024-08-11 02:11:41,725 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 02:11:46,783 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 02:12:02,546 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 02:12:04,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.614e+01 2.961e+01 3.435e+01 1.383e+02, threshold=5.922e+01, percent-clipped=1.0 2024-08-11 02:12:08,315 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 02:12:09,594 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 02:12:14,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-08-11 02:12:26,438 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13850, loss[loss=0.1021, beats_loss=0.01231, ecapa_loss=0.0002117, whisper_loss=0.08773, over 15965.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01147, ecapa_loss=0.000217, whisper_loss=0.09479, over 3872386.17 frames. ], batch size: 64, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:12:58,053 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 02:13:00,680 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 02:13:04,724 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 02:13:25,609 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 02:13:26,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=863550.0, ans=22.5 2024-08-11 02:13:36,770 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13900, loss[loss=0.08757, beats_loss=0.0108, ecapa_loss=0.000273, whisper_loss=0.07404, over 16261.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01149, ecapa_loss=0.0002166, whisper_loss=0.09468, over 3885654.41 frames. ], batch size: 70, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:13:42,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=863650.0, ans=0.125 2024-08-11 02:13:59,311 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 02:14:00,795 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 02:14:03,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=863850.0, ans=0.0 2024-08-11 02:14:10,906 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 02:14:14,766 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 02:14:15,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=863950.0, ans=0.2 2024-08-11 02:14:23,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.760e+01 3.035e+01 3.739e+01 6.215e+01, threshold=6.069e+01, percent-clipped=1.0 2024-08-11 02:14:24,595 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-11 02:14:35,760 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 02:14:42,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 13950, loss[loss=0.119, beats_loss=0.01121, ecapa_loss=0.000198, whisper_loss=0.1058, over 23201.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01163, ecapa_loss=0.0002157, whisper_loss=0.0944, over 3898988.35 frames. ], batch size: 91, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:14:43,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=864150.0, ans=0.04949747468305833 2024-08-11 02:14:47,434 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 02:14:55,072 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 02:15:04,474 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-11 02:15:05,730 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 02:15:06,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864250.0, ans=0.1 2024-08-11 02:15:12,095 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 02:15:19,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=864350.0, ans=0.0 2024-08-11 02:15:21,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-11 02:15:46,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=864650.0, ans=0.125 2024-08-11 02:15:47,374 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14000, loss[loss=0.1113, beats_loss=0.01116, ecapa_loss=0.0001856, whisper_loss=0.09829, over 20075.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01159, ecapa_loss=0.000214, whisper_loss=0.09407, over 3888903.07 frames. ], batch size: 75, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:15:50,392 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 02:16:33,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.879e+01 3.227e+01 3.709e+01 6.302e+01, threshold=6.454e+01, percent-clipped=1.0 2024-08-11 02:16:36,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=10.0 2024-08-11 02:16:42,620 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 02:16:52,843 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14050, loss[loss=0.1185, beats_loss=0.01147, ecapa_loss=0.000258, whisper_loss=0.1044, over 20976.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01166, ecapa_loss=0.0002126, whisper_loss=0.09417, over 3899414.90 frames. ], batch size: 87, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:17:01,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=865150.0, ans=15.0 2024-08-11 02:17:02,025 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 02:17:18,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=865350.0, ans=0.125 2024-08-11 02:17:22,767 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 02:17:36,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=865450.0, ans=0.125 2024-08-11 02:17:57,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14100, loss[loss=0.1071, beats_loss=0.0111, ecapa_loss=0.0002766, whisper_loss=0.09323, over 21005.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01166, ecapa_loss=0.0002126, whisper_loss=0.09444, over 3907751.42 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:17:59,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=865650.0, ans=0.0 2024-08-11 02:18:16,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-11 02:18:41,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=865950.0, ans=0.0 2024-08-11 02:18:44,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.713e+01 2.992e+01 3.543e+01 5.369e+01, threshold=5.983e+01, percent-clipped=0.0 2024-08-11 02:19:04,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14150, loss[loss=0.1233, beats_loss=0.01023, ecapa_loss=0.0002089, whisper_loss=0.111, over 15518.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01171, ecapa_loss=0.0002126, whisper_loss=0.09475, over 3892922.31 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:19:07,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=866150.0, ans=0.125 2024-08-11 02:19:08,815 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-11 02:19:15,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=866150.0, ans=0.125 2024-08-11 02:19:38,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2024-08-11 02:19:38,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=866350.0, ans=15.0 2024-08-11 02:19:42,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=866350.0, ans=15.0 2024-08-11 02:19:56,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=866550.0, ans=0.125 2024-08-11 02:19:56,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=866550.0, ans=0.125 2024-08-11 02:19:56,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2024-08-11 02:20:01,662 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 02:20:04,304 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 02:20:10,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14200, loss[loss=0.1186, beats_loss=0.01021, ecapa_loss=0.0002097, whisper_loss=0.1063, over 21532.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01165, ecapa_loss=0.0002128, whisper_loss=0.09419, over 3895896.15 frames. ], batch size: 84, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:20:13,353 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 02:20:14,613 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 02:20:16,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.02 vs. limit=6.0 2024-08-11 02:20:19,532 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 02:20:19,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=866650.0, ans=0.125 2024-08-11 02:20:48,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=866850.0, ans=0.125 2024-08-11 02:20:54,999 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 02:20:57,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.819e+01 3.173e+01 3.823e+01 7.553e+01, threshold=6.347e+01, percent-clipped=1.0 2024-08-11 02:21:10,691 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-11 02:21:19,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14250, loss[loss=0.09024, beats_loss=0.01254, ecapa_loss=0.0002199, whisper_loss=0.07551, over 20693.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01161, ecapa_loss=0.0002154, whisper_loss=0.09372, over 3887862.53 frames. ], batch size: 83, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:21:21,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-11 02:21:22,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=867150.0, ans=0.125 2024-08-11 02:21:23,663 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 02:21:29,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2024-08-11 02:21:55,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-11 02:21:57,245 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 02:21:58,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=867450.0, ans=0.125 2024-08-11 02:22:01,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=867450.0, ans=0.0 2024-08-11 02:22:10,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=867450.0, ans=0.125 2024-08-11 02:22:10,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-11 02:22:14,531 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 02:22:16,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.23 vs. limit=15.0 2024-08-11 02:22:26,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14300, loss[loss=0.1134, beats_loss=0.01259, ecapa_loss=0.0002114, whisper_loss=0.09873, over 19115.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01151, ecapa_loss=0.0002122, whisper_loss=0.09484, over 3894250.50 frames. ], batch size: 77, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:22:44,276 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 02:22:45,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=867750.0, ans=0.125 2024-08-11 02:23:06,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867950.0, ans=0.1 2024-08-11 02:23:11,976 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.633e+01 2.947e+01 3.319e+01 6.322e+01, threshold=5.893e+01, percent-clipped=0.0 2024-08-11 02:23:19,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-11 02:23:22,357 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 02:23:31,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14350, loss[loss=0.08563, beats_loss=0.01398, ecapa_loss=0.0001816, whisper_loss=0.06983, over 17662.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01157, ecapa_loss=0.0002126, whisper_loss=0.09416, over 3904533.82 frames. ], batch size: 71, lr: 1.00e-02, grad_scale: 281474976710656.0 2024-08-11 02:23:32,577 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-11 02:23:56,670 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 02:23:57,816 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 02:23:59,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-11 02:24:06,910 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 02:24:25,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.0 2024-08-11 02:24:33,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=868550.0, ans=0.125 2024-08-11 02:24:33,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-11 02:24:35,735 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14400, loss[loss=0.113, beats_loss=0.01162, ecapa_loss=0.0001852, whisper_loss=0.09955, over 22671.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01152, ecapa_loss=0.0002127, whisper_loss=0.09508, over 3930794.23 frames. ], batch size: 89, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:24:35,898 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 02:24:54,039 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 24 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-11 02:25:06,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=868850.0, ans=0.0 2024-08-11 02:25:21,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.691e+01 3.158e+01 3.511e+01 8.025e+01, threshold=6.317e+01, percent-clipped=1.0 2024-08-11 02:25:22,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=12.0 2024-08-11 02:25:24,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=868950.0, ans=0.2 2024-08-11 02:25:26,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.34 vs. limit=22.5 2024-08-11 02:25:33,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=869050.0, ans=0.125 2024-08-11 02:25:40,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 6, batch 14450, loss[loss=0.1244, beats_loss=0.01109, ecapa_loss=0.0002091, whisper_loss=0.1112, over 23910.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01155, ecapa_loss=0.0002143, whisper_loss=0.09477, over 3924802.27 frames. ], batch size: 93, lr: 9.99e-03, grad_scale: 281474976710656.0 2024-08-11 02:25:51,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869150.0, ans=0.1 2024-08-11 02:25:52,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=869250.0, ans=0.125 2024-08-11 02:25:57,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=869250.0, ans=0.0 2024-08-11 02:25:59,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=869250.0, ans=10.0 2024-08-11 02:26:14,816 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 02:26:25,981 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-11 02:26:36,861 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-6.pt 2024-08-11 02:27:16,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 0, loss[loss=0.1452, beats_loss=0.009607, ecapa_loss=0.0002719, whisper_loss=0.1329, over 19406.00 frames. ], tot_loss[loss=0.1452, beats_loss=0.009607, ecapa_loss=0.0002719, whisper_loss=0.1329, over 19406.00 frames. ], batch size: 76, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:27:16,276 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 02:28:00,261 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006864, whisper_loss=0.2518, over 922467.00 frames. 2024-08-11 02:28:07,801 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0462, 5.7044, 4.8962, 5.5300], device='cuda:0') 2024-08-11 02:28:18,663 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on SV_voxceleb1: loss=0.00579, beats_loss=0, ecapa_loss=0.000579, whisper_loss=0, over 939242.00 frames. 2024-08-11 02:30:27,687 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on AT_audioset: loss=0.02579, beats_loss=0.02579, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 02:30:27,690 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 02:30:32,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=869590.0, ans=0.125 2024-08-11 02:30:52,383 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 02:31:00,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869690.0, ans=0.1 2024-08-11 02:31:05,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=869690.0, ans=0.0 2024-08-11 02:31:23,337 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 02:31:27,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=869790.0, ans=0.09899494936611666 2024-08-11 02:31:27,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=869790.0, ans=0.0 2024-08-11 02:32:23,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=869890.0, ans=0.125 2024-08-11 02:32:35,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.976e+01 3.314e+01 3.996e+01 6.220e+01, threshold=6.628e+01, percent-clipped=0.0 2024-08-11 02:33:11,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 50, loss[loss=0.122, beats_loss=0.008299, ecapa_loss=0.000272, whisper_loss=0.111, over 19807.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01109, ecapa_loss=0.0002215, whisper_loss=0.09123, over 872603.59 frames. ], batch size: 81, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:33:14,448 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 02:33:58,352 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 02:34:02,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870190.0, ans=0.1 2024-08-11 02:34:12,048 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 02:34:28,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=870290.0, ans=10.0 2024-08-11 02:34:45,478 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 02:34:56,113 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 02:34:56,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=870290.0, ans=0.0 2024-08-11 02:36:00,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=870490.0, ans=0.125 2024-08-11 02:36:02,471 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 02:36:18,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 100, loss[loss=0.1015, beats_loss=0.01114, ecapa_loss=0.0002375, whisper_loss=0.08802, over 19857.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01107, ecapa_loss=0.0002198, whisper_loss=0.09163, over 1520116.36 frames. ], batch size: 82, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:36:18,391 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 02:36:19,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2024-08-11 02:36:28,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=870590.0, ans=0.125 2024-08-11 02:36:33,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=870590.0, ans=0.0 2024-08-11 02:36:44,195 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-11 02:37:28,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=870790.0, ans=0.0 2024-08-11 02:37:29,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-08-11 02:37:49,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=870790.0, ans=0.0 2024-08-11 02:37:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=870790.0, ans=0.0 2024-08-11 02:37:52,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.64 vs. limit=10.0 2024-08-11 02:38:13,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=870890.0, ans=0.125 2024-08-11 02:38:20,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=870890.0, ans=0.0 2024-08-11 02:38:22,280 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 02:38:32,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+01 3.124e+01 3.380e+01 3.805e+01 6.032e+01, threshold=6.760e+01, percent-clipped=0.0 2024-08-11 02:38:32,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=870990.0, ans=0.0 2024-08-11 02:38:33,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=15.0 2024-08-11 02:38:34,715 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 02:38:37,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=870990.0, ans=0.125 2024-08-11 02:38:49,765 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 150, loss[loss=0.1025, beats_loss=0.0112, ecapa_loss=0.0002138, whisper_loss=0.08913, over 18243.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01129, ecapa_loss=0.0002167, whisper_loss=0.09185, over 2043748.41 frames. ], batch size: 72, lr: 9.36e-03, grad_scale: 281474976710656.0 2024-08-11 02:39:14,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=871190.0, ans=0.0 2024-08-11 02:39:39,243 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-11 02:39:46,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=871390.0, ans=0.025 2024-08-11 02:39:50,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-11 02:40:06,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=871490.0, ans=0.0 2024-08-11 02:40:06,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871490.0, ans=0.1 2024-08-11 02:40:16,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 200, loss[loss=0.08656, beats_loss=0.01312, ecapa_loss=0.0001735, whisper_loss=0.0717, over 13964.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01113, ecapa_loss=0.0002163, whisper_loss=0.09337, over 2439647.86 frames. ], batch size: 53, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:40:18,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=871590.0, ans=0.1 2024-08-11 02:40:20,903 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 02:40:23,893 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 02:40:45,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=871690.0, ans=0.5 2024-08-11 02:40:51,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=871790.0, ans=0.05 2024-08-11 02:41:20,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=871990.0, ans=0.125 2024-08-11 02:41:21,795 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+01 2.791e+01 3.109e+01 3.398e+01 1.022e+02, threshold=6.218e+01, percent-clipped=1.0 2024-08-11 02:41:23,236 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 02:41:35,595 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 250, loss[loss=0.1116, beats_loss=0.01154, ecapa_loss=0.0002127, whisper_loss=0.09794, over 22378.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01113, ecapa_loss=0.000215, whisper_loss=0.09413, over 2741623.75 frames. ], batch size: 89, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:41:43,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.73 vs. limit=22.5 2024-08-11 02:41:58,534 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 02:42:20,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=872290.0, ans=0.125 2024-08-11 02:42:26,120 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-11 02:42:28,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=872390.0, ans=0.125 2024-08-11 02:42:57,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 300, loss[loss=0.1086, beats_loss=0.01072, ecapa_loss=0.0001813, whisper_loss=0.09609, over 21311.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01123, ecapa_loss=0.0002137, whisper_loss=0.09386, over 3009727.71 frames. ], batch size: 85, lr: 9.35e-03, grad_scale: 281474976710656.0 2024-08-11 02:43:03,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872590.0, ans=0.1 2024-08-11 02:43:11,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=872690.0, ans=0.0 2024-08-11 02:43:28,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=872790.0, ans=0.1 2024-08-11 02:43:51,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-11 02:43:53,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=872890.0, ans=0.125 2024-08-11 02:43:59,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=872990.0, ans=0.07 2024-08-11 02:44:02,254 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.643e+01 2.910e+01 3.334e+01 5.693e+01, threshold=5.820e+01, percent-clipped=0.0 2024-08-11 02:44:14,529 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 02:44:15,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2024-08-11 02:44:15,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 350, loss[loss=0.1072, beats_loss=0.01166, ecapa_loss=0.0002254, whisper_loss=0.09331, over 18212.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01128, ecapa_loss=0.0002101, whisper_loss=0.09434, over 3219946.31 frames. ], batch size: 74, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:44:16,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=873090.0, ans=0.0 2024-08-11 02:44:17,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=873090.0, ans=0.125 2024-08-11 02:44:52,748 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 02:45:05,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=873390.0, ans=0.0 2024-08-11 02:45:17,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=873490.0, ans=0.0 2024-08-11 02:45:23,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873490.0, ans=0.125 2024-08-11 02:45:27,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=873490.0, ans=0.125 2024-08-11 02:45:27,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-11 02:45:32,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 400, loss[loss=0.09558, beats_loss=0.01126, ecapa_loss=0.00019, whisper_loss=0.08242, over 23755.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01132, ecapa_loss=0.0002078, whisper_loss=0.09388, over 3380981.55 frames. ], batch size: 94, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:45:43,677 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 02:45:43,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873590.0, ans=0.125 2024-08-11 02:45:50,296 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 02:45:53,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=873690.0, ans=0.07 2024-08-11 02:45:58,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=873690.0, ans=0.125 2024-08-11 02:46:03,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=873790.0, ans=0.015 2024-08-11 02:46:18,833 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 02:46:35,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.580e+01 2.895e+01 3.398e+01 1.445e+02, threshold=5.790e+01, percent-clipped=1.0 2024-08-11 02:46:41,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=873990.0, ans=0.0 2024-08-11 02:46:48,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 450, loss[loss=0.09419, beats_loss=0.009256, ecapa_loss=0.0002296, whisper_loss=0.08263, over 19488.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01124, ecapa_loss=0.000209, whisper_loss=0.09411, over 3487952.13 frames. ], batch size: 79, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:47:02,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2024-08-11 02:47:11,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=874190.0, ans=0.125 2024-08-11 02:47:21,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=874290.0, ans=0.2 2024-08-11 02:47:21,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=874290.0, ans=0.125 2024-08-11 02:47:40,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=874390.0, ans=0.04949747468305833 2024-08-11 02:47:51,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-11 02:47:53,704 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 02:47:55,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=874490.0, ans=0.125 2024-08-11 02:47:57,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=874490.0, ans=0.05 2024-08-11 02:48:02,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 500, loss[loss=0.1005, beats_loss=0.01194, ecapa_loss=0.0001746, whisper_loss=0.08677, over 19365.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01131, ecapa_loss=0.0002061, whisper_loss=0.09348, over 3600167.64 frames. ], batch size: 74, lr: 9.34e-03, grad_scale: 281474976710656.0 2024-08-11 02:48:18,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874690.0, ans=0.1 2024-08-11 02:48:22,430 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 02:48:22,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=874690.0, ans=0.0 2024-08-11 02:48:32,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874790.0, ans=0.0 2024-08-11 02:48:37,510 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 02:48:49,270 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-11 02:48:54,507 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 02:48:58,380 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.783e+01 3.369e+01 3.762e+01 6.753e+01, threshold=6.739e+01, percent-clipped=3.0 2024-08-11 02:48:58,969 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.591e-02 2024-08-11 02:49:02,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=874990.0, ans=0.0 2024-08-11 02:49:09,288 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 02:49:10,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 550, loss[loss=0.1085, beats_loss=0.01189, ecapa_loss=0.0002289, whisper_loss=0.09434, over 22195.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01122, ecapa_loss=0.0002078, whisper_loss=0.09328, over 3641149.66 frames. ], batch size: 91, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:49:20,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=875090.0, ans=0.0 2024-08-11 02:49:22,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=875190.0, ans=0.05 2024-08-11 02:49:25,104 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 02:49:27,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875190.0, ans=0.1 2024-08-11 02:49:40,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875290.0, ans=0.1 2024-08-11 02:49:55,141 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 02:49:55,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=875390.0, ans=0.125 2024-08-11 02:49:56,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=12.0 2024-08-11 02:49:58,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875390.0, ans=0.125 2024-08-11 02:50:00,095 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 02:50:01,432 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-11 02:50:08,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=875490.0, ans=0.05 2024-08-11 02:50:15,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 600, loss[loss=0.1138, beats_loss=0.01123, ecapa_loss=0.0001983, whisper_loss=0.1006, over 22298.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01124, ecapa_loss=0.000206, whisper_loss=0.0939, over 3704548.13 frames. ], batch size: 89, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:50:19,701 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 02:50:19,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=875590.0, ans=0.0 2024-08-11 02:50:20,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=22.5 2024-08-11 02:50:21,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=875590.0, ans=0.125 2024-08-11 02:50:35,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=875690.0, ans=0.125 2024-08-11 02:50:54,674 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 02:51:03,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-11 02:51:09,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.091e+01 2.703e+01 3.008e+01 3.347e+01 4.794e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 02:51:09,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=875990.0, ans=0.125 2024-08-11 02:51:14,428 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 02:51:14,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=875990.0, ans=0.125 2024-08-11 02:51:20,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 650, loss[loss=0.1019, beats_loss=0.01077, ecapa_loss=0.0001945, whisper_loss=0.08915, over 15325.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01128, ecapa_loss=0.0002063, whisper_loss=0.09312, over 3699084.92 frames. ], batch size: 59, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:51:27,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=876090.0, ans=0.125 2024-08-11 02:51:28,855 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-11 02:51:32,696 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 02:52:00,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=876390.0, ans=0.125 2024-08-11 02:52:08,000 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 02:52:08,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=876390.0, ans=0.125 2024-08-11 02:52:26,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 700, loss[loss=0.1086, beats_loss=0.0121, ecapa_loss=0.0002029, whisper_loss=0.09444, over 22549.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01127, ecapa_loss=0.000207, whisper_loss=0.09299, over 3721716.56 frames. ], batch size: 90, lr: 9.33e-03, grad_scale: 281474976710656.0 2024-08-11 02:52:35,828 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 02:52:37,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.45 vs. limit=10.0 2024-08-11 02:52:39,628 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 02:52:51,453 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 02:52:58,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=876790.0, ans=0.035 2024-08-11 02:53:13,251 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 11 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 02:53:18,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=876990.0, ans=0.125 2024-08-11 02:53:19,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.223e+01 2.856e+01 3.234e+01 3.790e+01 5.945e+01, threshold=6.469e+01, percent-clipped=0.0 2024-08-11 02:53:31,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 750, loss[loss=0.08821, beats_loss=0.01215, ecapa_loss=0.0001872, whisper_loss=0.07419, over 18789.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01138, ecapa_loss=0.000205, whisper_loss=0.09254, over 3747684.57 frames. ], batch size: 76, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:53:38,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=877090.0, ans=0.125 2024-08-11 02:53:52,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2024-08-11 02:53:53,529 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 02:53:58,531 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 02:54:20,790 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 02:54:36,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 800, loss[loss=0.1034, beats_loss=0.01178, ecapa_loss=0.0001928, whisper_loss=0.08974, over 15978.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01136, ecapa_loss=0.0002059, whisper_loss=0.09244, over 3766698.25 frames. ], batch size: 60, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:54:46,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=877590.0, ans=0.125 2024-08-11 02:55:25,907 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 02:55:29,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.644e+01 2.972e+01 3.441e+01 7.984e+01, threshold=5.944e+01, percent-clipped=1.0 2024-08-11 02:55:35,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-08-11 02:55:40,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878090.0, ans=0.1 2024-08-11 02:55:41,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 850, loss[loss=0.1266, beats_loss=0.01019, ecapa_loss=0.0001842, whisper_loss=0.1146, over 22676.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01141, ecapa_loss=0.0002054, whisper_loss=0.09238, over 3772430.88 frames. ], batch size: 86, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:55:44,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=878090.0, ans=0.125 2024-08-11 02:55:53,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=878190.0, ans=0.125 2024-08-11 02:56:00,842 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 02:56:22,944 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.334e+00 2024-08-11 02:56:26,800 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.939e-01 2024-08-11 02:56:28,930 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 02:56:30,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=878390.0, ans=0.125 2024-08-11 02:56:43,068 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 02:56:44,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878590.0, ans=0.1 2024-08-11 02:56:45,628 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 900, loss[loss=0.1081, beats_loss=0.0086, ecapa_loss=0.0002426, whisper_loss=0.09712, over 14041.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01143, ecapa_loss=0.0002048, whisper_loss=0.09186, over 3749983.32 frames. ], batch size: 56, lr: 9.32e-03, grad_scale: 281474976710656.0 2024-08-11 02:56:46,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=878590.0, ans=0.125 2024-08-11 02:56:47,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=878590.0, ans=0.125 2024-08-11 02:57:04,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=878690.0, ans=0.125 2024-08-11 02:57:13,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=878790.0, ans=0.2 2024-08-11 02:57:21,201 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 02:57:28,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2024-08-11 02:57:29,097 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 02:57:30,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=878890.0, ans=0.125 2024-08-11 02:57:32,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2024-08-11 02:57:38,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.614e+01 2.988e+01 3.449e+01 5.810e+01, threshold=5.976e+01, percent-clipped=0.0 2024-08-11 02:57:41,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=878990.0, ans=0.2 2024-08-11 02:57:51,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 950, loss[loss=0.1095, beats_loss=0.01212, ecapa_loss=0.000199, whisper_loss=0.09543, over 20629.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01148, ecapa_loss=0.0002038, whisper_loss=0.09183, over 3793931.27 frames. ], batch size: 82, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:57:52,901 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 02:57:54,205 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 02:57:56,763 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 02:57:57,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=879090.0, ans=0.125 2024-08-11 02:58:14,901 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 02:58:19,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879290.0, ans=0.1 2024-08-11 02:58:22,154 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 02:58:33,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=879390.0, ans=0.125 2024-08-11 02:58:41,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=879390.0, ans=0.0 2024-08-11 02:58:45,740 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-11 02:58:49,378 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 02:58:51,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=879490.0, ans=0.0 2024-08-11 02:58:54,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=879490.0, ans=0.0 2024-08-11 02:58:57,096 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 02:58:58,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.41 vs. limit=22.5 2024-08-11 02:59:00,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1000, loss[loss=0.1122, beats_loss=0.01304, ecapa_loss=0.0001523, whisper_loss=0.0976, over 16374.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01152, ecapa_loss=0.0002028, whisper_loss=0.09148, over 3811113.59 frames. ], batch size: 61, lr: 9.31e-03, grad_scale: 281474976710656.0 2024-08-11 02:59:24,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2024-08-11 02:59:27,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=879790.0, ans=0.125 2024-08-11 02:59:31,931 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 02:59:52,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=879890.0, ans=0.125 2024-08-11 02:59:57,795 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-88000.pt 2024-08-11 03:00:01,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.796e+01 3.092e+01 3.418e+01 4.355e+01, threshold=6.184e+01, percent-clipped=0.0 2024-08-11 03:00:02,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=879990.0, ans=0.0 2024-08-11 03:00:03,112 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 03:00:06,869 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 03:00:08,548 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 03:00:12,683 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 03:00:12,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=880090.0, ans=0.125 2024-08-11 03:00:13,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1050, loss[loss=0.09492, beats_loss=0.0126, ecapa_loss=0.0001654, whisper_loss=0.08066, over 15557.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01143, ecapa_loss=0.0002032, whisper_loss=0.092, over 3841485.69 frames. ], batch size: 61, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:00:19,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2024-08-11 03:00:25,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=880090.0, ans=0.0 2024-08-11 03:00:43,207 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 03:00:47,738 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 03:01:02,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=880390.0, ans=0.125 2024-08-11 03:01:06,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-08-11 03:01:20,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880490.0, ans=0.1 2024-08-11 03:01:27,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1100, loss[loss=0.1, beats_loss=0.01169, ecapa_loss=0.0002277, whisper_loss=0.08606, over 20400.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01142, ecapa_loss=0.000202, whisper_loss=0.09277, over 3855976.59 frames. ], batch size: 83, lr: 9.31e-03, grad_scale: 562949953421312.0 2024-08-11 03:01:43,594 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 03:02:00,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=880790.0, ans=0.125 2024-08-11 03:02:08,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=880790.0, ans=0.125 2024-08-11 03:02:27,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.648e+01 3.166e+01 3.461e+01 5.758e+01, threshold=6.333e+01, percent-clipped=0.0 2024-08-11 03:02:36,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=880990.0, ans=0.2 2024-08-11 03:02:37,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-11 03:02:38,045 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.047e-02 2024-08-11 03:02:40,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1150, loss[loss=0.1095, beats_loss=0.01241, ecapa_loss=0.0001843, whisper_loss=0.09529, over 22559.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01146, ecapa_loss=0.0002026, whisper_loss=0.09244, over 3850842.87 frames. ], batch size: 88, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:02:41,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=12.0 2024-08-11 03:02:42,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=881090.0, ans=0.05 2024-08-11 03:02:44,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=881090.0, ans=0.125 2024-08-11 03:02:46,820 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 03:03:07,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=881190.0, ans=0.0 2024-08-11 03:03:14,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881290.0, ans=0.1 2024-08-11 03:03:18,213 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 03:03:26,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=881390.0, ans=15.0 2024-08-11 03:03:27,192 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 03:03:32,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=881390.0, ans=0.125 2024-08-11 03:03:46,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=881490.0, ans=0.125 2024-08-11 03:03:52,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1200, loss[loss=0.07773, beats_loss=0.01213, ecapa_loss=0.0001336, whisper_loss=0.06426, over 15311.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01149, ecapa_loss=0.0002008, whisper_loss=0.09237, over 3827053.71 frames. ], batch size: 58, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:03:58,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=881590.0, ans=0.0 2024-08-11 03:04:35,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=881890.0, ans=0.0 2024-08-11 03:04:37,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881890.0, ans=0.125 2024-08-11 03:04:51,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=12.0 2024-08-11 03:04:52,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.507e+01 2.887e+01 3.348e+01 4.586e+01, threshold=5.774e+01, percent-clipped=0.0 2024-08-11 03:05:02,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=881990.0, ans=0.125 2024-08-11 03:05:05,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1250, loss[loss=0.09103, beats_loss=0.01469, ecapa_loss=0.0001394, whisper_loss=0.07495, over 14102.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01152, ecapa_loss=0.0002002, whisper_loss=0.09255, over 3809332.30 frames. ], batch size: 55, lr: 9.30e-03, grad_scale: 562949953421312.0 2024-08-11 03:05:40,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=882290.0, ans=0.125 2024-08-11 03:05:43,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=882290.0, ans=0.125 2024-08-11 03:05:53,636 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.600e-03 2024-08-11 03:06:03,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=882490.0, ans=0.2 2024-08-11 03:06:14,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2024-08-11 03:06:20,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1300, loss[loss=0.1047, beats_loss=0.01233, ecapa_loss=0.0001947, whisper_loss=0.09046, over 23436.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01138, ecapa_loss=0.0002005, whisper_loss=0.09372, over 3844852.51 frames. ], batch size: 93, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:06:45,025 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 03:06:56,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=882790.0, ans=0.125 2024-08-11 03:07:10,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=882890.0, ans=0.125 2024-08-11 03:07:20,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.642e+01 3.016e+01 3.566e+01 8.330e+01, threshold=6.031e+01, percent-clipped=1.0 2024-08-11 03:07:22,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=882990.0, ans=0.125 2024-08-11 03:07:24,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=882990.0, ans=0.0 2024-08-11 03:07:25,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2024-08-11 03:07:28,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=882990.0, ans=0.125 2024-08-11 03:07:34,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1350, loss[loss=0.08168, beats_loss=0.01354, ecapa_loss=0.0001612, whisper_loss=0.06653, over 16451.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01134, ecapa_loss=0.0002, whisper_loss=0.09357, over 3864041.94 frames. ], batch size: 65, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:07:40,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=883090.0, ans=0.2 2024-08-11 03:07:43,278 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 03:07:45,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883090.0, ans=0.1 2024-08-11 03:07:56,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=883190.0, ans=0.2 2024-08-11 03:07:57,956 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 03:08:04,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=883290.0, ans=0.0 2024-08-11 03:08:12,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883290.0, ans=0.1 2024-08-11 03:08:33,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883490.0, ans=0.125 2024-08-11 03:08:48,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1400, loss[loss=0.09884, beats_loss=0.009992, ecapa_loss=0.0002114, whisper_loss=0.08673, over 16844.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01136, ecapa_loss=0.0002004, whisper_loss=0.09255, over 3850565.29 frames. ], batch size: 68, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:08:49,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=883590.0, ans=0.125 2024-08-11 03:08:50,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2024-08-11 03:08:59,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=883590.0, ans=0.125 2024-08-11 03:09:11,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2024-08-11 03:09:12,149 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-11 03:09:20,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=883790.0, ans=0.1 2024-08-11 03:09:22,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=883790.0, ans=0.2 2024-08-11 03:09:23,699 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 03:09:33,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=883890.0, ans=0.125 2024-08-11 03:09:35,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=883890.0, ans=0.0 2024-08-11 03:09:35,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=883890.0, ans=0.125 2024-08-11 03:09:41,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=883890.0, ans=0.125 2024-08-11 03:09:44,449 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 03:09:49,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.657e+01 3.071e+01 3.496e+01 6.029e+01, threshold=6.143e+01, percent-clipped=0.0 2024-08-11 03:09:49,935 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 03:09:54,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=883990.0, ans=0.0 2024-08-11 03:10:01,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.938e-01 2024-08-11 03:10:37,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1450, loss[loss=0.08695, beats_loss=0.01358, ecapa_loss=0.0002023, whisper_loss=0.07135, over 18682.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01131, ecapa_loss=0.0002009, whisper_loss=0.09218, over 3829700.49 frames. ], batch size: 75, lr: 9.29e-03, grad_scale: 562949953421312.0 2024-08-11 03:11:02,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2024-08-11 03:11:11,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=884290.0, ans=0.125 2024-08-11 03:11:14,392 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-11 03:11:25,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=884390.0, ans=0.125 2024-08-11 03:11:30,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884390.0, ans=0.1 2024-08-11 03:11:39,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=884490.0, ans=0.1 2024-08-11 03:11:53,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1500, loss[loss=0.107, beats_loss=0.01196, ecapa_loss=0.0002214, whisper_loss=0.09281, over 22192.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01129, ecapa_loss=0.0002023, whisper_loss=0.09279, over 3877135.73 frames. ], batch size: 90, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:11:53,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=884590.0, ans=0.0 2024-08-11 03:11:57,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=884590.0, ans=0.1 2024-08-11 03:12:28,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-11 03:12:28,833 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 03:12:30,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=884790.0, ans=0.0 2024-08-11 03:12:42,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=884890.0, ans=0.1 2024-08-11 03:12:53,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.055e+01 2.731e+01 3.107e+01 3.593e+01 6.683e+01, threshold=6.214e+01, percent-clipped=1.0 2024-08-11 03:12:54,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=884990.0, ans=0.125 2024-08-11 03:12:57,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=884990.0, ans=0.2 2024-08-11 03:13:07,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1550, loss[loss=0.11, beats_loss=0.01226, ecapa_loss=0.0001659, whisper_loss=0.09611, over 20017.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01139, ecapa_loss=0.000201, whisper_loss=0.09224, over 3877442.49 frames. ], batch size: 77, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:13:08,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885090.0, ans=0.1 2024-08-11 03:13:12,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=885090.0, ans=0.125 2024-08-11 03:13:23,842 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 03:13:23,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=885190.0, ans=0.0 2024-08-11 03:13:27,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=885190.0, ans=10.0 2024-08-11 03:13:42,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=885290.0, ans=0.0 2024-08-11 03:14:01,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=885390.0, ans=0.2 2024-08-11 03:14:12,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=885490.0, ans=0.0 2024-08-11 03:14:14,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=15.0 2024-08-11 03:14:18,729 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 03:14:20,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.45 vs. limit=5.0 2024-08-11 03:14:21,451 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1600, loss[loss=0.09892, beats_loss=0.01593, ecapa_loss=0.0001468, whisper_loss=0.08152, over 23617.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01145, ecapa_loss=0.0001996, whisper_loss=0.09187, over 3871110.11 frames. ], batch size: 94, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:14:28,916 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:14:28,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=885590.0, ans=0.2 2024-08-11 03:14:30,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=12.0 2024-08-11 03:14:35,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=885690.0, ans=0.05 2024-08-11 03:14:44,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=885690.0, ans=0.125 2024-08-11 03:15:05,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=885890.0, ans=10.0 2024-08-11 03:15:21,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.608e+01 2.973e+01 3.361e+01 6.559e+01, threshold=5.946e+01, percent-clipped=1.0 2024-08-11 03:15:24,336 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 03:15:26,063 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-11 03:15:30,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=885990.0, ans=0.0 2024-08-11 03:15:34,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1650, loss[loss=0.09019, beats_loss=0.01337, ecapa_loss=0.0001897, whisper_loss=0.07492, over 17933.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01142, ecapa_loss=0.000199, whisper_loss=0.0924, over 3842285.74 frames. ], batch size: 72, lr: 9.28e-03, grad_scale: 562949953421312.0 2024-08-11 03:15:39,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.41 vs. limit=15.0 2024-08-11 03:16:08,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=886290.0, ans=0.0 2024-08-11 03:16:16,440 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.961e+02 2024-08-11 03:16:25,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=886390.0, ans=0.125 2024-08-11 03:16:37,248 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 03:16:44,945 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1700, loss[loss=0.1466, beats_loss=0.007677, ecapa_loss=0.0002459, whisper_loss=0.1365, over 22663.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0001996, whisper_loss=0.09317, over 3847504.12 frames. ], batch size: 87, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:17:08,821 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 03:17:10,093 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 03:17:13,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=886790.0, ans=0.125 2024-08-11 03:17:22,363 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 03:17:23,636 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 03:17:25,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=886890.0, ans=0.125 2024-08-11 03:17:42,201 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.697e+01 3.081e+01 3.373e+01 4.997e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 03:17:55,098 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1750, loss[loss=0.109, beats_loss=0.01078, ecapa_loss=0.0002317, whisper_loss=0.0959, over 20671.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01133, ecapa_loss=0.0001991, whisper_loss=0.09307, over 3863783.34 frames. ], batch size: 86, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:18:20,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=887190.0, ans=0.125 2024-08-11 03:18:33,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=887290.0, ans=0.125 2024-08-11 03:18:56,688 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 25 from Vox, 15 fro AS 2024-08-11 03:18:59,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.11 vs. limit=15.0 2024-08-11 03:19:03,028 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1800, loss[loss=0.08383, beats_loss=0.01364, ecapa_loss=0.0001534, whisper_loss=0.06866, over 19240.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01128, ecapa_loss=0.0002004, whisper_loss=0.09207, over 3821693.88 frames. ], batch size: 74, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:19:04,496 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-11 03:19:05,801 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 03:19:07,060 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 03:19:13,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=887590.0, ans=0.04949747468305833 2024-08-11 03:19:21,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=887690.0, ans=0.125 2024-08-11 03:19:31,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=887790.0, ans=0.0 2024-08-11 03:19:32,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=887790.0, ans=0.125 2024-08-11 03:19:36,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=887790.0, ans=0.125 2024-08-11 03:19:43,017 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 03:19:45,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2024-08-11 03:19:58,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=887990.0, ans=0.125 2024-08-11 03:20:00,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.601e+01 2.973e+01 3.471e+01 4.949e+01, threshold=5.947e+01, percent-clipped=0.0 2024-08-11 03:20:00,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-11 03:20:05,772 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 03:20:08,935 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 03:20:13,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1850, loss[loss=0.09343, beats_loss=0.01041, ecapa_loss=0.0002106, whisper_loss=0.08091, over 14269.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01127, ecapa_loss=0.0002011, whisper_loss=0.09261, over 3800558.49 frames. ], batch size: 56, lr: 9.27e-03, grad_scale: 562949953421312.0 2024-08-11 03:20:27,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=888190.0, ans=0.1 2024-08-11 03:20:34,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=888190.0, ans=0.2 2024-08-11 03:20:36,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=888190.0, ans=0.125 2024-08-11 03:21:14,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-11 03:21:22,164 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1900, loss[loss=0.098, beats_loss=0.01245, ecapa_loss=0.0001779, whisper_loss=0.08377, over 22887.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01138, ecapa_loss=0.0002034, whisper_loss=0.09206, over 3822771.26 frames. ], batch size: 91, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:21:25,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=888590.0, ans=0.0 2024-08-11 03:21:29,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=888590.0, ans=15.0 2024-08-11 03:21:53,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=888790.0, ans=0.125 2024-08-11 03:21:56,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=888790.0, ans=0.125 2024-08-11 03:22:01,079 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 03:22:10,225 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 40 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 03:22:16,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+01 2.590e+01 3.002e+01 3.327e+01 6.064e+01, threshold=6.004e+01, percent-clipped=1.0 2024-08-11 03:22:20,421 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 03:22:29,225 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 03:22:30,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 1950, loss[loss=0.07566, beats_loss=0.008718, ecapa_loss=0.0002116, whisper_loss=0.06483, over 16415.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01135, ecapa_loss=0.0002051, whisper_loss=0.09246, over 3821059.72 frames. ], batch size: 64, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:22:39,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889090.0, ans=0.125 2024-08-11 03:22:43,909 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 03:22:47,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=889190.0, ans=0.2 2024-08-11 03:22:53,560 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 03:22:54,769 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 03:22:55,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-11 03:22:57,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=889290.0, ans=0.125 2024-08-11 03:23:04,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=889290.0, ans=0.0 2024-08-11 03:23:19,332 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 03:23:24,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=889490.0, ans=0.125 2024-08-11 03:23:28,481 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 03:23:29,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.18 vs. limit=10.0 2024-08-11 03:23:36,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=889490.0, ans=0.0 2024-08-11 03:23:38,605 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2000, loss[loss=0.08682, beats_loss=0.01308, ecapa_loss=0.000169, whisper_loss=0.07205, over 14887.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01131, ecapa_loss=0.000206, whisper_loss=0.09275, over 3826851.30 frames. ], batch size: 56, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:23:39,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=889590.0, ans=0.09899494936611666 2024-08-11 03:23:44,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=889590.0, ans=0.0 2024-08-11 03:23:48,266 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 26 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-11 03:23:53,498 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 03:24:01,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=12.0 2024-08-11 03:24:05,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=889790.0, ans=0.5 2024-08-11 03:24:14,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=889790.0, ans=0.015 2024-08-11 03:24:29,437 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 03:24:34,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.753e+01 3.127e+01 3.595e+01 5.672e+01, threshold=6.254e+01, percent-clipped=0.0 2024-08-11 03:24:39,170 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 03:24:43,382 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-11 03:24:47,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2050, loss[loss=0.09603, beats_loss=0.01203, ecapa_loss=0.0002324, whisper_loss=0.08167, over 18864.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01143, ecapa_loss=0.0002053, whisper_loss=0.09201, over 3827827.31 frames. ], batch size: 78, lr: 9.26e-03, grad_scale: 562949953421312.0 2024-08-11 03:24:51,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2024-08-11 03:24:51,995 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 03:24:58,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=890090.0, ans=0.125 2024-08-11 03:25:02,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=890190.0, ans=0.125 2024-08-11 03:25:17,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890290.0, ans=0.1 2024-08-11 03:25:27,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=890290.0, ans=0.125 2024-08-11 03:25:40,398 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 03:25:43,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=890390.0, ans=0.2 2024-08-11 03:25:43,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2024-08-11 03:25:56,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=890490.0, ans=10.0 2024-08-11 03:26:00,597 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2100, loss[loss=0.1102, beats_loss=0.01184, ecapa_loss=0.0001752, whisper_loss=0.09661, over 19225.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01143, ecapa_loss=0.0002052, whisper_loss=0.09234, over 3829441.03 frames. ], batch size: 75, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:26:21,483 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-11 03:26:26,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890690.0, ans=0.1 2024-08-11 03:26:27,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2024-08-11 03:26:31,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=890790.0, ans=0.0 2024-08-11 03:26:32,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=890790.0, ans=0.5 2024-08-11 03:26:33,817 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 03:26:43,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890890.0, ans=0.1 2024-08-11 03:26:52,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=890890.0, ans=0.125 2024-08-11 03:27:01,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.655e+01 3.007e+01 3.449e+01 4.820e+01, threshold=6.014e+01, percent-clipped=0.0 2024-08-11 03:27:14,278 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2150, loss[loss=0.09118, beats_loss=0.01233, ecapa_loss=0.000199, whisper_loss=0.07686, over 13864.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01147, ecapa_loss=0.0002048, whisper_loss=0.0926, over 3821671.18 frames. ], batch size: 53, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:27:26,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-11 03:27:29,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=891190.0, ans=0.0 2024-08-11 03:27:30,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=891190.0, ans=0.125 2024-08-11 03:27:40,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-11 03:27:40,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2024-08-11 03:27:41,501 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-11 03:28:01,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=891390.0, ans=0.0 2024-08-11 03:28:06,918 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 03:28:11,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=891490.0, ans=0.1 2024-08-11 03:28:16,645 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 03:28:26,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2200, loss[loss=0.1223, beats_loss=0.00972, ecapa_loss=0.0001974, whisper_loss=0.1106, over 18064.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01149, ecapa_loss=0.0002056, whisper_loss=0.09385, over 3846530.34 frames. ], batch size: 69, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:28:34,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=891590.0, ans=0.0 2024-08-11 03:28:45,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=891690.0, ans=0.0 2024-08-11 03:28:59,844 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 03:29:04,374 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 03:29:12,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=891890.0, ans=0.125 2024-08-11 03:29:27,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.671e+01 3.021e+01 3.496e+01 5.518e+01, threshold=6.042e+01, percent-clipped=0.0 2024-08-11 03:29:35,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=891990.0, ans=0.1 2024-08-11 03:29:35,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=891990.0, ans=0.0 2024-08-11 03:29:40,582 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2250, loss[loss=0.07696, beats_loss=0.01569, ecapa_loss=0.0001892, whisper_loss=0.05938, over 14113.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01167, ecapa_loss=0.0002061, whisper_loss=0.09286, over 3843871.92 frames. ], batch size: 58, lr: 9.25e-03, grad_scale: 562949953421312.0 2024-08-11 03:29:52,560 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 03:29:53,823 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 03:29:57,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=892190.0, ans=0.125 2024-08-11 03:30:01,421 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 03:30:10,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=892290.0, ans=0.0 2024-08-11 03:30:16,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-11 03:30:24,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=892390.0, ans=0.125 2024-08-11 03:30:27,289 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 03:30:40,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2024-08-11 03:30:41,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=892490.0, ans=0.125 2024-08-11 03:30:52,492 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2300, loss[loss=0.1078, beats_loss=0.009419, ecapa_loss=0.0002446, whisper_loss=0.09598, over 19229.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01161, ecapa_loss=0.0002058, whisper_loss=0.09377, over 3866482.40 frames. ], batch size: 77, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:31:08,556 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 03:31:52,104 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 03:31:54,828 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.862e+01 3.270e+01 3.564e+01 5.997e+01, threshold=6.539e+01, percent-clipped=0.0 2024-08-11 03:32:09,171 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2350, loss[loss=0.09173, beats_loss=0.01455, ecapa_loss=0.0001672, whisper_loss=0.07551, over 23008.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01157, ecapa_loss=0.000208, whisper_loss=0.09412, over 3858026.54 frames. ], batch size: 91, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:32:16,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893090.0, ans=0.1 2024-08-11 03:32:39,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=893290.0, ans=0.125 2024-08-11 03:32:46,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-11 03:32:49,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=893290.0, ans=0.125 2024-08-11 03:32:54,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=893390.0, ans=0.2 2024-08-11 03:32:58,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=893390.0, ans=0.125 2024-08-11 03:33:10,397 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 03:33:25,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2400, loss[loss=0.107, beats_loss=0.01242, ecapa_loss=0.0001685, whisper_loss=0.09294, over 22998.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01155, ecapa_loss=0.0002075, whisper_loss=0.09447, over 3870071.73 frames. ], batch size: 90, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:33:43,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=893690.0, ans=0.0 2024-08-11 03:34:06,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=893790.0, ans=0.125 2024-08-11 03:34:28,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.589e+01 2.936e+01 3.311e+01 5.160e+01, threshold=5.871e+01, percent-clipped=0.0 2024-08-11 03:34:30,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=893990.0, ans=0.05 2024-08-11 03:34:34,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-11 03:34:36,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-11 03:34:40,991 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 03:34:44,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2450, loss[loss=0.07745, beats_loss=0.01513, ecapa_loss=0.0001964, whisper_loss=0.06035, over 14849.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01153, ecapa_loss=0.0002071, whisper_loss=0.09416, over 3884789.28 frames. ], batch size: 61, lr: 9.24e-03, grad_scale: 562949953421312.0 2024-08-11 03:34:44,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-11 03:34:53,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=894090.0, ans=0.125 2024-08-11 03:34:59,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=894190.0, ans=0.125 2024-08-11 03:35:05,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=894190.0, ans=0.125 2024-08-11 03:35:26,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=894290.0, ans=0.125 2024-08-11 03:35:34,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=894390.0, ans=0.125 2024-08-11 03:35:39,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=894390.0, ans=0.0 2024-08-11 03:35:44,957 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 03:35:46,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=894490.0, ans=0.125 2024-08-11 03:35:49,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=894490.0, ans=0.125 2024-08-11 03:35:52,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=894490.0, ans=0.125 2024-08-11 03:35:58,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2500, loss[loss=0.1127, beats_loss=0.0102, ecapa_loss=0.0002233, whisper_loss=0.1002, over 19421.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.0114, ecapa_loss=0.0002079, whisper_loss=0.09476, over 3863854.99 frames. ], batch size: 78, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:36:02,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=894590.0, ans=0.125 2024-08-11 03:36:38,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=894790.0, ans=0.125 2024-08-11 03:36:48,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=894890.0, ans=0.0 2024-08-11 03:36:49,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=894890.0, ans=0.125 2024-08-11 03:36:57,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=894890.0, ans=0.0 2024-08-11 03:37:03,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.242e+01 2.710e+01 3.039e+01 3.423e+01 5.787e+01, threshold=6.079e+01, percent-clipped=0.0 2024-08-11 03:37:16,836 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2550, loss[loss=0.1043, beats_loss=0.01017, ecapa_loss=0.0002349, whisper_loss=0.09176, over 19771.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01137, ecapa_loss=0.0002074, whisper_loss=0.09504, over 3873904.60 frames. ], batch size: 77, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:37:23,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895090.0, ans=0.1 2024-08-11 03:37:30,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2024-08-11 03:37:36,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=895190.0, ans=0.125 2024-08-11 03:37:39,283 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 27 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 03:38:00,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=895290.0, ans=0.2 2024-08-11 03:38:30,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=12.0 2024-08-11 03:38:33,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2600, loss[loss=0.1094, beats_loss=0.01364, ecapa_loss=0.0001495, whisper_loss=0.09424, over 14080.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01137, ecapa_loss=0.0002078, whisper_loss=0.09452, over 3851032.36 frames. ], batch size: 55, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:38:49,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=895690.0, ans=0.125 2024-08-11 03:38:50,877 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 03:38:56,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=895690.0, ans=0.125 2024-08-11 03:39:18,302 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 03:39:23,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=895890.0, ans=0.125 2024-08-11 03:39:33,343 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 03:39:36,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.572e+01 2.896e+01 3.197e+01 4.923e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 03:39:50,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2650, loss[loss=0.1067, beats_loss=0.008481, ecapa_loss=0.0002705, whisper_loss=0.0955, over 17466.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01142, ecapa_loss=0.0002075, whisper_loss=0.09437, over 3839478.70 frames. ], batch size: 72, lr: 9.23e-03, grad_scale: 562949953421312.0 2024-08-11 03:39:53,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=896090.0, ans=0.125 2024-08-11 03:39:57,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=896090.0, ans=0.125 2024-08-11 03:40:07,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896190.0, ans=0.1 2024-08-11 03:40:25,228 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 03:40:27,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896290.0, ans=0.1 2024-08-11 03:40:39,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=896390.0, ans=0.125 2024-08-11 03:40:50,877 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 03:40:52,418 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 03:41:02,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=896490.0, ans=0.2 2024-08-11 03:41:05,854 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2700, loss[loss=0.09435, beats_loss=0.01305, ecapa_loss=0.0001766, whisper_loss=0.07953, over 20678.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01152, ecapa_loss=0.0002092, whisper_loss=0.09363, over 3859794.00 frames. ], batch size: 84, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:41:24,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=896690.0, ans=0.0 2024-08-11 03:41:27,016 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 03:41:39,679 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 03:41:56,739 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-11 03:42:01,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=896890.0, ans=0.2 2024-08-11 03:42:01,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896890.0, ans=0.1 2024-08-11 03:42:04,125 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 03:42:09,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=896990.0, ans=0.125 2024-08-11 03:42:11,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.641e+01 2.955e+01 3.583e+01 6.037e+01, threshold=5.910e+01, percent-clipped=1.0 2024-08-11 03:42:25,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2750, loss[loss=0.1231, beats_loss=0.01068, ecapa_loss=0.0002244, whisper_loss=0.1102, over 23754.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0115, ecapa_loss=0.0002093, whisper_loss=0.0942, over 3849698.31 frames. ], batch size: 93, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:42:35,317 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 03:42:43,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=897190.0, ans=0.125 2024-08-11 03:42:44,539 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 03:42:46,063 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 03:43:15,475 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 11 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-11 03:43:32,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-11 03:43:35,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=897490.0, ans=0.0 2024-08-11 03:43:43,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-08-11 03:43:43,792 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2800, loss[loss=0.1182, beats_loss=0.01166, ecapa_loss=0.0001864, whisper_loss=0.1047, over 20500.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01152, ecapa_loss=0.000209, whisper_loss=0.09384, over 3823786.75 frames. ], batch size: 77, lr: 9.22e-03, grad_scale: 562949953421312.0 2024-08-11 03:44:08,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=897690.0, ans=0.2 2024-08-11 03:44:12,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=897690.0, ans=0.5 2024-08-11 03:44:20,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=897790.0, ans=0.0 2024-08-11 03:44:21,882 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 03:44:35,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=897890.0, ans=0.125 2024-08-11 03:44:35,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-11 03:44:48,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.710e+01 2.962e+01 3.650e+01 5.339e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 03:45:01,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2024-08-11 03:45:02,311 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2850, loss[loss=0.08333, beats_loss=0.01189, ecapa_loss=0.0002212, whisper_loss=0.06923, over 13229.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01157, ecapa_loss=0.0002086, whisper_loss=0.09377, over 3817882.15 frames. ], batch size: 56, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:45:38,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=898290.0, ans=0.2 2024-08-11 03:45:39,855 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 03:45:41,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=898290.0, ans=0.125 2024-08-11 03:45:59,731 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 03:46:13,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=898490.0, ans=0.5 2024-08-11 03:46:23,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=898590.0, ans=0.2 2024-08-11 03:46:25,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2900, loss[loss=0.1143, beats_loss=0.01069, ecapa_loss=0.000258, whisper_loss=0.1011, over 21706.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01157, ecapa_loss=0.0002088, whisper_loss=0.09344, over 3836554.57 frames. ], batch size: 91, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:46:33,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.072e-03 2024-08-11 03:46:34,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=898590.0, ans=0.0 2024-08-11 03:46:44,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898690.0, ans=0.1 2024-08-11 03:46:47,698 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 03:47:00,451 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 03:47:00,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=898790.0, ans=0.09899494936611666 2024-08-11 03:47:30,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.297e+01 2.659e+01 2.989e+01 3.721e+01 7.203e+01, threshold=5.978e+01, percent-clipped=1.0 2024-08-11 03:47:39,990 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 03:47:45,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 2950, loss[loss=0.1206, beats_loss=0.009877, ecapa_loss=0.0002217, whisper_loss=0.1085, over 18484.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01157, ecapa_loss=0.0002094, whisper_loss=0.0936, over 3883626.73 frames. ], batch size: 73, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:47:52,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=12.0 2024-08-11 03:47:57,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899090.0, ans=0.1 2024-08-11 03:48:01,206 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 03:48:05,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=12.0 2024-08-11 03:48:29,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=899290.0, ans=0.125 2024-08-11 03:48:35,955 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 03:48:39,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=899390.0, ans=0.125 2024-08-11 03:48:42,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-08-11 03:48:48,392 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 03:48:49,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=899490.0, ans=0.125 2024-08-11 03:48:55,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=899490.0, ans=0.0 2024-08-11 03:48:58,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=899490.0, ans=0.125 2024-08-11 03:49:08,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3000, loss[loss=0.1176, beats_loss=0.01017, ecapa_loss=0.0001956, whisper_loss=0.1055, over 18565.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01155, ecapa_loss=0.00021, whisper_loss=0.09362, over 3877566.60 frames. ], batch size: 72, lr: 9.21e-03, grad_scale: 562949953421312.0 2024-08-11 03:49:08,463 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 03:49:38,851 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.9166, 1.3757, 1.9113, 1.5150, 1.8736, 1.8279, 2.0377, 1.7834], device='cuda:0') 2024-08-11 03:49:48,974 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006718, whisper_loss=0.2519, over 922467.00 frames. 2024-08-11 03:50:07,519 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on SV_voxceleb1: loss=0.005617, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0, over 939242.00 frames. 2024-08-11 03:51:52,901 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6282, 4.0134, 4.4351, 4.4334], device='cuda:0') 2024-08-11 03:52:03,499 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on AT_audioset: loss=0.02572, beats_loss=0.02572, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 03:52:03,505 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 03:52:11,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=899590.0, ans=0.0 2024-08-11 03:52:11,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=899590.0, ans=0.0 2024-08-11 03:52:16,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=899590.0, ans=15.0 2024-08-11 03:52:16,912 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 03:52:26,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=899690.0, ans=0.1 2024-08-11 03:52:51,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=899790.0, ans=0.125 2024-08-11 03:52:55,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=899890.0, ans=0.0 2024-08-11 03:53:06,699 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 03:53:06,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=899890.0, ans=0.125 2024-08-11 03:53:15,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.673e+01 3.038e+01 3.538e+01 6.757e+01, threshold=6.077e+01, percent-clipped=1.0 2024-08-11 03:53:30,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3050, loss[loss=0.1109, beats_loss=0.01054, ecapa_loss=0.0002216, whisper_loss=0.09813, over 19923.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01158, ecapa_loss=0.0002107, whisper_loss=0.0938, over 3871675.80 frames. ], batch size: 79, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:53:34,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=900090.0, ans=0.2 2024-08-11 03:53:50,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=900190.0, ans=0.125 2024-08-11 03:54:06,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=900290.0, ans=0.125 2024-08-11 03:54:28,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=900390.0, ans=0.0 2024-08-11 03:54:32,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=900390.0, ans=0.125 2024-08-11 03:54:37,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900390.0, ans=0.125 2024-08-11 03:54:47,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-11 03:54:50,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=900490.0, ans=0.125 2024-08-11 03:54:57,812 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3100, loss[loss=0.112, beats_loss=0.01146, ecapa_loss=0.0002194, whisper_loss=0.09837, over 22143.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01155, ecapa_loss=0.000211, whisper_loss=0.09412, over 3899620.63 frames. ], batch size: 87, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:54:59,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=900590.0, ans=0.1 2024-08-11 03:55:00,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900590.0, ans=0.125 2024-08-11 03:55:04,161 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 03:55:04,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-08-11 03:55:10,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=900590.0, ans=0.125 2024-08-11 03:55:34,530 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 03:55:47,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=900890.0, ans=0.125 2024-08-11 03:56:05,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.642e+01 2.994e+01 3.477e+01 5.395e+01, threshold=5.988e+01, percent-clipped=0.0 2024-08-11 03:56:08,832 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.271e-01 2024-08-11 03:56:17,050 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 03:56:20,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3150, loss[loss=0.1171, beats_loss=0.01157, ecapa_loss=0.0002051, whisper_loss=0.1035, over 17067.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01156, ecapa_loss=0.0002095, whisper_loss=0.09387, over 3889804.46 frames. ], batch size: 69, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:56:22,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=15.0 2024-08-11 03:56:23,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=901090.0, ans=0.125 2024-08-11 03:56:23,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=901090.0, ans=0.0 2024-08-11 03:56:28,322 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 03:57:18,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=901390.0, ans=0.5 2024-08-11 03:57:22,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=901390.0, ans=0.0 2024-08-11 03:57:27,875 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 03:57:35,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=901490.0, ans=0.0 2024-08-11 03:57:41,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=15.0 2024-08-11 03:57:44,143 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3200, loss[loss=0.149, beats_loss=0.009198, ecapa_loss=0.0002097, whisper_loss=0.1377, over 23251.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01144, ecapa_loss=0.0002118, whisper_loss=0.09464, over 3883335.78 frames. ], batch size: 87, lr: 9.20e-03, grad_scale: 1125899906842624.0 2024-08-11 03:58:02,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=901690.0, ans=0.125 2024-08-11 03:58:12,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=901690.0, ans=10.0 2024-08-11 03:58:19,048 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 03:58:28,347 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 03:58:51,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.680e+01 2.966e+01 3.598e+01 6.746e+01, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 03:58:51,987 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 03:58:58,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-11 03:59:06,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3250, loss[loss=0.08415, beats_loss=0.0121, ecapa_loss=0.0002451, whisper_loss=0.0696, over 14983.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01144, ecapa_loss=0.0002119, whisper_loss=0.09391, over 3874132.18 frames. ], batch size: 62, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 03:59:08,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=902090.0, ans=0.2 2024-08-11 03:59:17,321 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 03:59:20,846 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 03:59:21,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=902190.0, ans=0.125 2024-08-11 04:00:14,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2024-08-11 04:00:23,118 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 04:00:25,920 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3300, loss[loss=0.1127, beats_loss=0.01374, ecapa_loss=0.0001922, whisper_loss=0.09708, over 21811.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01147, ecapa_loss=0.0002126, whisper_loss=0.09371, over 3849337.37 frames. ], batch size: 88, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:00:26,031 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 04:00:40,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=902590.0, ans=0.0 2024-08-11 04:00:59,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2024-08-11 04:01:14,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2024-08-11 04:01:29,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-11 04:01:38,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.725e+01 3.245e+01 3.907e+01 7.359e+01, threshold=6.490e+01, percent-clipped=2.0 2024-08-11 04:01:51,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=903090.0, ans=0.125 2024-08-11 04:01:52,056 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3350, loss[loss=0.1219, beats_loss=0.01068, ecapa_loss=0.0001763, whisper_loss=0.1095, over 23111.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01138, ecapa_loss=0.0002119, whisper_loss=0.09514, over 3874898.69 frames. ], batch size: 91, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:01:56,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=12.0 2024-08-11 04:02:03,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-11 04:02:07,679 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 04:02:33,794 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 04:02:46,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-08-11 04:02:56,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903490.0, ans=0.1 2024-08-11 04:03:04,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=903490.0, ans=0.125 2024-08-11 04:03:13,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3400, loss[loss=0.1013, beats_loss=0.01319, ecapa_loss=0.0001791, whisper_loss=0.08631, over 17974.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01134, ecapa_loss=0.000211, whisper_loss=0.09514, over 3863753.76 frames. ], batch size: 72, lr: 9.19e-03, grad_scale: 1125899906842624.0 2024-08-11 04:03:16,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.05 vs. limit=10.0 2024-08-11 04:03:26,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903590.0, ans=0.1 2024-08-11 04:03:40,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-08-11 04:04:10,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=903890.0, ans=0.125 2024-08-11 04:04:18,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.715e+01 3.132e+01 3.599e+01 6.001e+01, threshold=6.265e+01, percent-clipped=0.0 2024-08-11 04:04:32,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3450, loss[loss=0.1078, beats_loss=0.01334, ecapa_loss=0.000205, whisper_loss=0.09242, over 16815.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01139, ecapa_loss=0.0002097, whisper_loss=0.09477, over 3874104.02 frames. ], batch size: 68, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:04:45,728 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 04:04:51,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=904190.0, ans=0.1 2024-08-11 04:04:55,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=904190.0, ans=0.125 2024-08-11 04:04:57,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2024-08-11 04:05:26,685 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 04:05:37,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-11 04:05:40,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=904490.0, ans=0.07 2024-08-11 04:05:42,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3500, loss[loss=0.1238, beats_loss=0.01293, ecapa_loss=0.0001799, whisper_loss=0.1091, over 21778.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01142, ecapa_loss=0.0002081, whisper_loss=0.09491, over 3875985.09 frames. ], batch size: 85, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:05:46,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=904590.0, ans=0.0 2024-08-11 04:05:51,739 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 04:05:53,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=904590.0, ans=0.125 2024-08-11 04:06:06,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=904690.0, ans=0.09899494936611666 2024-08-11 04:06:13,934 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 36 from Vox, 27 fro AS 2024-08-11 04:06:19,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=904790.0, ans=0.0 2024-08-11 04:06:30,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=904890.0, ans=0.125 2024-08-11 04:06:35,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+01 2.786e+01 3.047e+01 3.456e+01 6.070e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 04:06:46,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=905090.0, ans=0.05 2024-08-11 04:06:47,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3550, loss[loss=0.1154, beats_loss=0.01037, ecapa_loss=0.0001884, whisper_loss=0.1031, over 23348.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01135, ecapa_loss=0.0002082, whisper_loss=0.09537, over 3876535.52 frames. ], batch size: 92, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:06:47,165 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 04:06:48,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=905090.0, ans=0.125 2024-08-11 04:07:26,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=905390.0, ans=0.0 2024-08-11 04:07:27,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=905390.0, ans=0.035 2024-08-11 04:07:41,425 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 04:07:45,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=905490.0, ans=0.125 2024-08-11 04:07:50,610 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 04:07:52,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2024-08-11 04:07:53,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3600, loss[loss=0.08704, beats_loss=0.01481, ecapa_loss=0.000198, whisper_loss=0.07025, over 20103.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01141, ecapa_loss=0.0002089, whisper_loss=0.09477, over 3873941.15 frames. ], batch size: 84, lr: 9.18e-03, grad_scale: 1125899906842624.0 2024-08-11 04:07:54,802 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 04:07:55,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=905590.0, ans=0.125 2024-08-11 04:08:08,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=905690.0, ans=0.025 2024-08-11 04:08:10,389 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 04:08:15,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-08-11 04:08:29,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=905790.0, ans=0.2 2024-08-11 04:08:30,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=905790.0, ans=0.04949747468305833 2024-08-11 04:08:33,001 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 04:08:33,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=905890.0, ans=0.1 2024-08-11 04:08:35,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=905890.0, ans=0.125 2024-08-11 04:08:47,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.715e+01 3.031e+01 3.500e+01 1.161e+02, threshold=6.062e+01, percent-clipped=1.0 2024-08-11 04:08:49,267 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-11 04:08:56,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=905990.0, ans=0.125 2024-08-11 04:08:59,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3650, loss[loss=0.09404, beats_loss=0.01109, ecapa_loss=0.0002307, whisper_loss=0.08064, over 16932.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01145, ecapa_loss=0.0002104, whisper_loss=0.09461, over 3866520.65 frames. ], batch size: 69, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:09:00,906 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 17 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 04:09:12,408 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 04:09:15,128 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 24 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 04:09:17,589 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 04:09:18,850 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 04:09:19,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=906190.0, ans=0.125 2024-08-11 04:09:26,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2024-08-11 04:09:38,173 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 04:09:58,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=906490.0, ans=0.0 2024-08-11 04:09:58,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-11 04:10:04,202 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3700, loss[loss=0.131, beats_loss=0.008611, ecapa_loss=0.0001986, whisper_loss=0.1204, over 21604.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0113, ecapa_loss=0.0002109, whisper_loss=0.09538, over 3861858.05 frames. ], batch size: 83, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:10:07,055 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 04:10:09,549 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 04:10:39,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=906790.0, ans=0.125 2024-08-11 04:10:58,333 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.695e+01 3.038e+01 3.419e+01 5.061e+01, threshold=6.077e+01, percent-clipped=0.0 2024-08-11 04:11:03,976 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-11 04:11:04,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906990.0, ans=0.1 2024-08-11 04:11:10,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3750, loss[loss=0.1088, beats_loss=0.01169, ecapa_loss=0.0002253, whisper_loss=0.09489, over 23098.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01149, ecapa_loss=0.0002097, whisper_loss=0.09471, over 3893500.99 frames. ], batch size: 94, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:11:13,649 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 04:11:16,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=907090.0, ans=0.125 2024-08-11 04:11:56,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=907390.0, ans=0.125 2024-08-11 04:11:58,998 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 04:12:02,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=907490.0, ans=0.1 2024-08-11 04:12:08,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=907490.0, ans=0.0 2024-08-11 04:12:16,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3800, loss[loss=0.09336, beats_loss=0.01027, ecapa_loss=0.0002794, whisper_loss=0.08029, over 18257.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01153, ecapa_loss=0.0002104, whisper_loss=0.09365, over 3895856.97 frames. ], batch size: 79, lr: 9.17e-03, grad_scale: 1125899906842624.0 2024-08-11 04:12:32,987 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:12:59,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=907890.0, ans=0.125 2024-08-11 04:13:00,349 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 04:13:08,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=907990.0, ans=0.125 2024-08-11 04:13:09,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.721e+01 2.981e+01 3.416e+01 8.567e+01, threshold=5.961e+01, percent-clipped=1.0 2024-08-11 04:13:20,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=15.0 2024-08-11 04:13:22,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3850, loss[loss=0.115, beats_loss=0.01233, ecapa_loss=0.0002072, whisper_loss=0.1007, over 23572.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01153, ecapa_loss=0.0002138, whisper_loss=0.09375, over 3878181.78 frames. ], batch size: 93, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:13:22,338 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 04:13:24,114 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 04:13:24,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-08-11 04:13:39,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=908190.0, ans=0.125 2024-08-11 04:13:56,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908290.0, ans=0.1 2024-08-11 04:14:04,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=908390.0, ans=0.125 2024-08-11 04:14:28,510 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 04:14:32,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3900, loss[loss=0.1023, beats_loss=0.01137, ecapa_loss=0.000241, whisper_loss=0.08854, over 21557.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01149, ecapa_loss=0.0002146, whisper_loss=0.09426, over 3875394.03 frames. ], batch size: 88, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:14:56,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=22.5 2024-08-11 04:14:57,536 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 04:15:12,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=908790.0, ans=0.125 2024-08-11 04:15:15,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=908890.0, ans=0.0 2024-08-11 04:15:23,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=908890.0, ans=0.125 2024-08-11 04:15:29,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=908890.0, ans=0.125 2024-08-11 04:15:32,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.202e+01 2.684e+01 3.033e+01 3.679e+01 6.201e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 04:15:33,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=908990.0, ans=0.125 2024-08-11 04:15:45,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 3950, loss[loss=0.1239, beats_loss=0.009226, ecapa_loss=0.0002089, whisper_loss=0.1126, over 23881.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01143, ecapa_loss=0.0002131, whisper_loss=0.09506, over 3890293.89 frames. ], batch size: 94, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:15:46,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=909090.0, ans=0.1 2024-08-11 04:15:49,815 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 04:16:03,806 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-11 04:16:04,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2024-08-11 04:16:11,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=909190.0, ans=0.2 2024-08-11 04:16:17,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=909290.0, ans=0.1 2024-08-11 04:16:39,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=909390.0, ans=0.2 2024-08-11 04:16:50,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=909490.0, ans=0.0 2024-08-11 04:16:54,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=909490.0, ans=0.04949747468305833 2024-08-11 04:16:59,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4000, loss[loss=0.127, beats_loss=0.009131, ecapa_loss=0.0002542, whisper_loss=0.1154, over 18078.00 frames. ], tot_loss[loss=0.1092, beats_loss=0.01139, ecapa_loss=0.0002147, whisper_loss=0.09563, over 3915639.97 frames. ], batch size: 73, lr: 9.16e-03, grad_scale: 1125899906842624.0 2024-08-11 04:17:02,406 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 04:17:11,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-08-11 04:17:43,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-08-11 04:17:54,466 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 04:17:57,465 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 04:17:59,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=909990.0, ans=0.1 2024-08-11 04:18:00,775 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.789e+01 3.181e+01 3.971e+01 6.202e+01, threshold=6.363e+01, percent-clipped=1.0 2024-08-11 04:18:06,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=909990.0, ans=0.09899494936611666 2024-08-11 04:18:07,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=909990.0, ans=0.025 2024-08-11 04:18:15,035 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4050, loss[loss=0.1194, beats_loss=0.009782, ecapa_loss=0.0002179, whisper_loss=0.1075, over 19992.00 frames. ], tot_loss[loss=0.1102, beats_loss=0.01133, ecapa_loss=0.0002156, whisper_loss=0.09673, over 3902557.46 frames. ], batch size: 80, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:18:15,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=910090.0, ans=0.0 2024-08-11 04:18:29,470 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 04:18:41,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=910190.0, ans=0.0 2024-08-11 04:18:53,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=910290.0, ans=0.1 2024-08-11 04:18:55,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=910290.0, ans=6.0 2024-08-11 04:18:59,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-11 04:19:15,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-11 04:19:18,603 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 04:19:21,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=910490.0, ans=0.2 2024-08-11 04:19:29,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=910590.0, ans=0.125 2024-08-11 04:19:30,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4100, loss[loss=0.1149, beats_loss=0.01138, ecapa_loss=0.0001852, whisper_loss=0.1016, over 22373.00 frames. ], tot_loss[loss=0.1094, beats_loss=0.01139, ecapa_loss=0.0002144, whisper_loss=0.09589, over 3890172.11 frames. ], batch size: 87, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:19:30,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=910590.0, ans=0.0 2024-08-11 04:19:34,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=910590.0, ans=0.0 2024-08-11 04:19:52,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-11 04:20:03,949 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-11 04:20:04,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910790.0, ans=0.1 2024-08-11 04:20:10,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=15.0 2024-08-11 04:20:15,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=910890.0, ans=15.0 2024-08-11 04:20:24,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=910890.0, ans=0.125 2024-08-11 04:20:31,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.238e+01 2.728e+01 2.968e+01 3.426e+01 6.142e+01, threshold=5.935e+01, percent-clipped=0.0 2024-08-11 04:20:32,035 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 25 from LS+wenet, 35 from Vox, 37 fro AS 2024-08-11 04:20:33,725 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.177e-02 2024-08-11 04:20:44,749 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-11 04:20:46,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4150, loss[loss=0.1026, beats_loss=0.009956, ecapa_loss=0.0002697, whisper_loss=0.0899, over 20604.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01137, ecapa_loss=0.0002144, whisper_loss=0.09594, over 3896431.36 frames. ], batch size: 87, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:20:53,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=911090.0, ans=0.125 2024-08-11 04:20:57,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.21 vs. limit=10.0 2024-08-11 04:21:00,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=911190.0, ans=0.125 2024-08-11 04:21:28,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=12.0 2024-08-11 04:21:37,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2024-08-11 04:21:45,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=911390.0, ans=0.2 2024-08-11 04:21:54,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=911490.0, ans=0.0 2024-08-11 04:22:02,625 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4200, loss[loss=0.09288, beats_loss=0.01214, ecapa_loss=0.0002045, whisper_loss=0.07869, over 18029.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01148, ecapa_loss=0.0002137, whisper_loss=0.09453, over 3866622.11 frames. ], batch size: 75, lr: 9.15e-03, grad_scale: 1125899906842624.0 2024-08-11 04:22:34,197 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-11 04:22:35,782 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 04:22:39,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=911790.0, ans=0.0 2024-08-11 04:22:45,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911890.0, ans=0.1 2024-08-11 04:22:57,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-11 04:23:01,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+01 2.712e+01 3.098e+01 3.462e+01 7.406e+01, threshold=6.196e+01, percent-clipped=1.0 2024-08-11 04:23:10,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=911990.0, ans=0.125 2024-08-11 04:23:13,818 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4250, loss[loss=0.1251, beats_loss=0.0103, ecapa_loss=0.0002197, whisper_loss=0.1126, over 15821.00 frames. ], tot_loss[loss=0.1082, beats_loss=0.01142, ecapa_loss=0.0002145, whisper_loss=0.09466, over 3865050.70 frames. ], batch size: 63, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:23:15,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=912090.0, ans=0.125 2024-08-11 04:23:31,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912190.0, ans=0.1 2024-08-11 04:23:46,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=912290.0, ans=15.0 2024-08-11 04:23:53,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.30 vs. limit=22.5 2024-08-11 04:24:05,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=912390.0, ans=0.125 2024-08-11 04:24:06,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=912390.0, ans=0.125 2024-08-11 04:24:12,091 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 04:24:22,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4300, loss[loss=0.1255, beats_loss=0.009701, ecapa_loss=0.0002525, whisper_loss=0.1133, over 19385.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.0114, ecapa_loss=0.0002134, whisper_loss=0.09461, over 3858732.81 frames. ], batch size: 78, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:24:24,049 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 04:24:24,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=912590.0, ans=0.0 2024-08-11 04:24:38,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=912690.0, ans=0.125 2024-08-11 04:24:38,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-11 04:24:42,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=912690.0, ans=0.125 2024-08-11 04:25:04,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=912890.0, ans=0.0 2024-08-11 04:25:05,720 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 04:25:08,767 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:25:15,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=912990.0, ans=0.125 2024-08-11 04:25:16,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.652e+01 2.970e+01 3.355e+01 6.636e+01, threshold=5.939e+01, percent-clipped=1.0 2024-08-11 04:25:16,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=912990.0, ans=0.0 2024-08-11 04:25:19,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=912990.0, ans=0.2 2024-08-11 04:25:21,540 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 04:25:22,893 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 04:25:27,181 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-11 04:25:28,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4350, loss[loss=0.1006, beats_loss=0.01039, ecapa_loss=0.0002344, whisper_loss=0.08791, over 17327.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01142, ecapa_loss=0.0002134, whisper_loss=0.09409, over 3853286.91 frames. ], batch size: 72, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:25:28,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=913090.0, ans=0.0 2024-08-11 04:25:48,575 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:25:58,731 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:26:04,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=913290.0, ans=0.0 2024-08-11 04:26:16,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=12.0 2024-08-11 04:26:20,775 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 04:26:23,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=913490.0, ans=0.0 2024-08-11 04:26:34,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4400, loss[loss=0.09754, beats_loss=0.01379, ecapa_loss=0.0002019, whisper_loss=0.08173, over 21362.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01139, ecapa_loss=0.0002122, whisper_loss=0.09399, over 3836699.72 frames. ], batch size: 89, lr: 9.14e-03, grad_scale: 1125899906842624.0 2024-08-11 04:26:34,286 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 04:26:36,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2024-08-11 04:26:44,509 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 04:26:49,941 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 04:27:07,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-08-11 04:27:22,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=913890.0, ans=10.0 2024-08-11 04:27:26,585 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 04:27:27,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.550e+01 2.848e+01 3.646e+01 5.843e+01, threshold=5.697e+01, percent-clipped=0.0 2024-08-11 04:27:28,138 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 04:27:33,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=913990.0, ans=0.0 2024-08-11 04:27:35,792 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 04:27:38,505 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 17 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-11 04:27:39,572 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4450, loss[loss=0.08083, beats_loss=0.0118, ecapa_loss=0.0002489, whisper_loss=0.06654, over 20090.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.000213, whisper_loss=0.09392, over 3848330.06 frames. ], batch size: 86, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:27:43,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=914090.0, ans=0.1 2024-08-11 04:27:46,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=914090.0, ans=10.0 2024-08-11 04:27:57,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2024-08-11 04:28:13,954 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 04:28:15,189 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-11 04:28:19,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=12.0 2024-08-11 04:28:42,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=914490.0, ans=0.1 2024-08-11 04:28:43,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2024-08-11 04:28:46,479 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 04:28:50,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-11 04:28:51,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4500, loss[loss=0.1077, beats_loss=0.01067, ecapa_loss=0.0002055, whisper_loss=0.09498, over 19787.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01138, ecapa_loss=0.0002122, whisper_loss=0.09425, over 3848801.70 frames. ], batch size: 79, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:29:04,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=914690.0, ans=0.05 2024-08-11 04:29:07,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=914690.0, ans=0.125 2024-08-11 04:29:12,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=914690.0, ans=0.2 2024-08-11 04:29:28,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=914790.0, ans=0.02 2024-08-11 04:29:31,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=914890.0, ans=0.125 2024-08-11 04:29:45,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=914990.0, ans=0.0 2024-08-11 04:29:47,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=914990.0, ans=0.0 2024-08-11 04:29:47,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.660e+01 3.113e+01 3.675e+01 6.136e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 04:29:52,288 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 04:29:53,451 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 04:29:59,995 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4550, loss[loss=0.1135, beats_loss=0.01011, ecapa_loss=0.000221, whisper_loss=0.1012, over 18371.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01132, ecapa_loss=0.0002129, whisper_loss=0.09458, over 3868228.89 frames. ], batch size: 73, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:30:04,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2024-08-11 04:30:15,948 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 04:30:32,798 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 04:30:37,141 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 04:30:50,289 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-11 04:31:05,912 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4600, loss[loss=0.09763, beats_loss=0.01608, ecapa_loss=0.0002094, whisper_loss=0.07946, over 21009.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01139, ecapa_loss=0.0002124, whisper_loss=0.09383, over 3895666.52 frames. ], batch size: 87, lr: 9.13e-03, grad_scale: 1125899906842624.0 2024-08-11 04:31:06,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=12.0 2024-08-11 04:31:12,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=915590.0, ans=0.0 2024-08-11 04:31:17,931 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 04:31:19,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=915690.0, ans=0.0 2024-08-11 04:31:25,670 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-11 04:31:34,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=915790.0, ans=0.95 2024-08-11 04:31:52,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.05 vs. limit=10.0 2024-08-11 04:31:54,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=915890.0, ans=0.125 2024-08-11 04:31:59,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.820e+01 3.108e+01 3.626e+01 5.972e+01, threshold=6.216e+01, percent-clipped=0.0 2024-08-11 04:32:04,440 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 04:32:09,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=916090.0, ans=0.125 2024-08-11 04:32:11,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4650, loss[loss=0.1201, beats_loss=0.01146, ecapa_loss=0.0002175, whisper_loss=0.1064, over 23707.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01148, ecapa_loss=0.0002137, whisper_loss=0.09343, over 3917961.34 frames. ], batch size: 94, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:32:46,242 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 04:32:46,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=916290.0, ans=0.0 2024-08-11 04:33:05,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=916490.0, ans=0.2 2024-08-11 04:33:06,908 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 04:33:14,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=15.0 2024-08-11 04:33:17,524 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4700, loss[loss=0.1334, beats_loss=0.0111, ecapa_loss=0.0001837, whisper_loss=0.1204, over 19489.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01153, ecapa_loss=0.000213, whisper_loss=0.09318, over 3912564.34 frames. ], batch size: 73, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:33:29,946 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 04:33:31,231 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 04:33:59,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2024-08-11 04:34:12,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.755e+01 3.145e+01 3.501e+01 4.476e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 04:34:15,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=916990.0, ans=0.0 2024-08-11 04:34:17,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-08-11 04:34:20,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=916990.0, ans=0.0 2024-08-11 04:34:23,827 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4750, loss[loss=0.09339, beats_loss=0.0131, ecapa_loss=0.0002183, whisper_loss=0.07811, over 20606.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01145, ecapa_loss=0.0002141, whisper_loss=0.09393, over 3915106.94 frames. ], batch size: 88, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:34:29,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=917090.0, ans=0.125 2024-08-11 04:34:46,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=917190.0, ans=0.1 2024-08-11 04:34:50,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=917290.0, ans=0.0 2024-08-11 04:34:52,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=917290.0, ans=0.0 2024-08-11 04:35:27,796 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 04:35:28,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4800, loss[loss=0.08438, beats_loss=0.01248, ecapa_loss=0.0002077, whisper_loss=0.06982, over 17446.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01159, ecapa_loss=0.0002126, whisper_loss=0.09311, over 3905483.08 frames. ], batch size: 71, lr: 9.12e-03, grad_scale: 1125899906842624.0 2024-08-11 04:35:37,928 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 04:35:46,007 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 04:35:52,308 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-11 04:35:57,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=917790.0, ans=0.125 2024-08-11 04:36:21,164 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 9 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 04:36:22,198 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.878e+01 3.268e+01 3.983e+01 7.610e+01, threshold=6.536e+01, percent-clipped=1.0 2024-08-11 04:36:34,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4850, loss[loss=0.1017, beats_loss=0.01239, ecapa_loss=0.0002029, whisper_loss=0.08732, over 16621.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0116, ecapa_loss=0.0002116, whisper_loss=0.09279, over 3913751.86 frames. ], batch size: 66, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:36:39,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=918090.0, ans=0.125 2024-08-11 04:36:42,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=918090.0, ans=0.125 2024-08-11 04:37:00,102 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 04:37:16,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-11 04:37:18,179 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 04:37:24,793 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-11 04:37:25,943 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 04:37:36,451 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 04:37:38,775 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4900, loss[loss=0.1342, beats_loss=0.009246, ecapa_loss=0.0002409, whisper_loss=0.1225, over 16362.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01161, ecapa_loss=0.0002101, whisper_loss=0.09291, over 3886700.08 frames. ], batch size: 62, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:37:40,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2024-08-11 04:37:43,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=918590.0, ans=0.125 2024-08-11 04:37:51,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=918690.0, ans=0.0 2024-08-11 04:37:52,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=918690.0, ans=0.125 2024-08-11 04:37:55,924 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 04:38:12,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=918790.0, ans=0.125 2024-08-11 04:38:31,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.611e+01 2.961e+01 3.443e+01 6.053e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 04:38:43,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 4950, loss[loss=0.1184, beats_loss=0.01039, ecapa_loss=0.0001965, whisper_loss=0.1061, over 18686.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01156, ecapa_loss=0.0002104, whisper_loss=0.09253, over 3856593.84 frames. ], batch size: 71, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:38:44,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=919090.0, ans=0.125 2024-08-11 04:38:55,803 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 04:39:18,955 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 04:39:53,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5000, loss[loss=0.1349, beats_loss=0.007799, ecapa_loss=0.0002259, whisper_loss=0.1249, over 19958.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01153, ecapa_loss=0.0002097, whisper_loss=0.09245, over 3831933.30 frames. ], batch size: 75, lr: 9.11e-03, grad_scale: 1125899906842624.0 2024-08-11 04:40:00,048 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 04:40:33,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-08-11 04:40:51,071 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-92000.pt 2024-08-11 04:40:54,898 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.217e+01 2.724e+01 2.983e+01 3.443e+01 5.585e+01, threshold=5.966e+01, percent-clipped=0.0 2024-08-11 04:41:06,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5050, loss[loss=0.09536, beats_loss=0.01072, ecapa_loss=0.0002264, whisper_loss=0.08238, over 16069.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01164, ecapa_loss=0.0002095, whisper_loss=0.09241, over 3823624.58 frames. ], batch size: 65, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:41:13,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=920090.0, ans=0.1 2024-08-11 04:41:13,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2024-08-11 04:41:38,337 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 04:41:41,004 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-11 04:41:48,534 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 04:41:54,633 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:41:55,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-11 04:41:57,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.40 vs. limit=22.5 2024-08-11 04:41:59,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=920390.0, ans=0.0 2024-08-11 04:42:02,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2024-08-11 04:42:08,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=920490.0, ans=0.0 2024-08-11 04:42:13,566 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-11 04:42:14,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=920490.0, ans=0.0 2024-08-11 04:42:18,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5100, loss[loss=0.1088, beats_loss=0.0123, ecapa_loss=0.0002199, whisper_loss=0.09432, over 17954.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01157, ecapa_loss=0.0002082, whisper_loss=0.09313, over 3834902.18 frames. ], batch size: 73, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:42:25,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=920590.0, ans=0.125 2024-08-11 04:42:34,626 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 04:42:34,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=920690.0, ans=0.0 2024-08-11 04:42:34,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=920690.0, ans=0.0 2024-08-11 04:42:49,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=920790.0, ans=0.125 2024-08-11 04:42:55,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=920790.0, ans=0.2 2024-08-11 04:43:17,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.667e+01 3.175e+01 3.575e+01 5.874e+01, threshold=6.350e+01, percent-clipped=0.0 2024-08-11 04:43:20,140 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 36 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 04:43:31,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5150, loss[loss=0.1036, beats_loss=0.01493, ecapa_loss=0.0001951, whisper_loss=0.08676, over 22815.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0115, ecapa_loss=0.000207, whisper_loss=0.09363, over 3852304.12 frames. ], batch size: 92, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:43:48,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.94 vs. limit=22.5 2024-08-11 04:43:49,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=921190.0, ans=0.1 2024-08-11 04:43:51,357 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 04:44:09,070 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 04:44:20,247 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 04:44:34,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=921490.0, ans=0.125 2024-08-11 04:44:36,603 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-11 04:44:45,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5200, loss[loss=0.09987, beats_loss=0.01315, ecapa_loss=0.0002006, whisper_loss=0.08471, over 17812.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01145, ecapa_loss=0.0002073, whisper_loss=0.09407, over 3847973.15 frames. ], batch size: 72, lr: 9.10e-03, grad_scale: 2251799813685248.0 2024-08-11 04:44:59,782 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 8 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 04:45:04,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=921690.0, ans=0.125 2024-08-11 04:45:15,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=921790.0, ans=0.125 2024-08-11 04:45:37,432 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 04:45:43,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=921890.0, ans=0.125 2024-08-11 04:45:47,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.623e+01 2.992e+01 3.438e+01 5.362e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-11 04:46:00,827 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5250, loss[loss=0.1302, beats_loss=0.009071, ecapa_loss=0.0002031, whisper_loss=0.1191, over 18177.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01146, ecapa_loss=0.000207, whisper_loss=0.09384, over 3870305.07 frames. ], batch size: 68, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:46:14,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=12.0 2024-08-11 04:46:22,223 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 04:46:39,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=922290.0, ans=0.2 2024-08-11 04:46:49,582 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 04:47:13,029 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5300, loss[loss=0.09587, beats_loss=0.01685, ecapa_loss=0.0001576, whisper_loss=0.07745, over 22130.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01139, ecapa_loss=0.0002062, whisper_loss=0.09458, over 3895972.95 frames. ], batch size: 88, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:47:13,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=922590.0, ans=0.125 2024-08-11 04:47:32,446 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 04:47:40,952 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-11 04:47:42,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-11 04:47:43,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=922790.0, ans=0.125 2024-08-11 04:47:50,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=922790.0, ans=0.125 2024-08-11 04:48:03,426 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 04:48:11,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.730e+01 3.116e+01 3.540e+01 5.766e+01, threshold=6.232e+01, percent-clipped=0.0 2024-08-11 04:48:11,407 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 04:48:15,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=922990.0, ans=0.125 2024-08-11 04:48:24,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5350, loss[loss=0.118, beats_loss=0.009522, ecapa_loss=0.000228, whisper_loss=0.1062, over 16567.00 frames. ], tot_loss[loss=0.108, beats_loss=0.0114, ecapa_loss=0.0002052, whisper_loss=0.09451, over 3887259.77 frames. ], batch size: 63, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:48:27,609 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 04:48:29,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=923090.0, ans=0.125 2024-08-11 04:48:33,434 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 04:48:49,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=923190.0, ans=0.0 2024-08-11 04:49:00,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.14 vs. limit=10.0 2024-08-11 04:49:06,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=923390.0, ans=0.0 2024-08-11 04:49:25,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=923490.0, ans=0.2 2024-08-11 04:49:27,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=923490.0, ans=0.0 2024-08-11 04:49:36,457 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5400, loss[loss=0.1133, beats_loss=0.009478, ecapa_loss=0.0002303, whisper_loss=0.1016, over 15964.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01143, ecapa_loss=0.0002058, whisper_loss=0.09464, over 3899220.85 frames. ], batch size: 63, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:49:39,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=923590.0, ans=0.125 2024-08-11 04:49:51,539 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 04:49:55,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=923690.0, ans=0.125 2024-08-11 04:50:07,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2024-08-11 04:50:15,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=923890.0, ans=0.125 2024-08-11 04:50:17,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=923890.0, ans=0.05 2024-08-11 04:50:25,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=923890.0, ans=0.0 2024-08-11 04:50:30,666 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 04:50:31,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.648e+01 2.918e+01 3.540e+01 6.193e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 04:50:36,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=923990.0, ans=0.125 2024-08-11 04:50:36,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=923990.0, ans=0.0 2024-08-11 04:50:42,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=924090.0, ans=0.125 2024-08-11 04:50:43,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5450, loss[loss=0.09001, beats_loss=0.01409, ecapa_loss=0.0001879, whisper_loss=0.07404, over 15716.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.0114, ecapa_loss=0.0002084, whisper_loss=0.09506, over 3895577.88 frames. ], batch size: 63, lr: 9.09e-03, grad_scale: 2251799813685248.0 2024-08-11 04:50:48,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=924090.0, ans=0.125 2024-08-11 04:50:51,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=924090.0, ans=0.2 2024-08-11 04:51:01,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=924190.0, ans=0.0 2024-08-11 04:51:02,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=924190.0, ans=0.125 2024-08-11 04:51:33,449 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 04:51:37,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=924490.0, ans=0.5 2024-08-11 04:51:47,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-11 04:51:50,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5500, loss[loss=0.1255, beats_loss=0.01136, ecapa_loss=0.0001975, whisper_loss=0.1122, over 17447.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01145, ecapa_loss=0.000207, whisper_loss=0.09415, over 3884366.95 frames. ], batch size: 66, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:51:51,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=924590.0, ans=0.0 2024-08-11 04:51:52,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-11 04:51:53,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-11 04:51:56,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=924590.0, ans=0.1 2024-08-11 04:52:04,262 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 04:52:09,187 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 04:52:28,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=924890.0, ans=0.125 2024-08-11 04:52:44,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.644e+01 3.103e+01 3.543e+01 6.260e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 04:52:56,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5550, loss[loss=0.1106, beats_loss=0.01113, ecapa_loss=0.0002099, whisper_loss=0.09741, over 23583.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01145, ecapa_loss=0.0002076, whisper_loss=0.09395, over 3879168.46 frames. ], batch size: 93, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:53:04,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=925090.0, ans=0.0 2024-08-11 04:53:10,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=925190.0, ans=0.0 2024-08-11 04:53:22,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=925290.0, ans=0.125 2024-08-11 04:53:22,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=925290.0, ans=0.1 2024-08-11 04:53:31,627 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 04:53:39,234 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 04:53:43,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=925390.0, ans=0.0 2024-08-11 04:53:46,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=925390.0, ans=0.125 2024-08-11 04:53:46,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=925390.0, ans=0.125 2024-08-11 04:53:52,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=925490.0, ans=0.04949747468305833 2024-08-11 04:53:55,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=925490.0, ans=0.04949747468305833 2024-08-11 04:54:01,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5600, loss[loss=0.1218, beats_loss=0.00982, ecapa_loss=0.0002135, whisper_loss=0.1098, over 19455.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01146, ecapa_loss=0.0002083, whisper_loss=0.09391, over 3882600.84 frames. ], batch size: 76, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:54:20,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-11 04:54:22,896 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-11 04:54:27,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=925790.0, ans=0.2 2024-08-11 04:54:27,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-08-11 04:54:28,351 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 04:54:34,648 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 04:54:38,534 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 04:54:38,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-08-11 04:54:50,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=925890.0, ans=0.0 2024-08-11 04:54:54,722 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+01 2.711e+01 3.123e+01 3.568e+01 9.227e+01, threshold=6.245e+01, percent-clipped=1.0 2024-08-11 04:55:05,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5650, loss[loss=0.1181, beats_loss=0.01254, ecapa_loss=0.0002084, whisper_loss=0.1035, over 22875.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01161, ecapa_loss=0.0002077, whisper_loss=0.09338, over 3918724.09 frames. ], batch size: 93, lr: 9.08e-03, grad_scale: 2251799813685248.0 2024-08-11 04:55:16,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=926090.0, ans=0.0 2024-08-11 04:55:29,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=926190.0, ans=0.0 2024-08-11 04:55:39,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=926290.0, ans=0.2 2024-08-11 04:55:42,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=926290.0, ans=0.1 2024-08-11 04:55:48,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=926390.0, ans=0.125 2024-08-11 04:55:58,451 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:56:03,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=926490.0, ans=0.09899494936611666 2024-08-11 04:56:04,813 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 04:56:10,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5700, loss[loss=0.0941, beats_loss=0.01353, ecapa_loss=0.0002243, whisper_loss=0.07833, over 21450.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01153, ecapa_loss=0.0002101, whisper_loss=0.09341, over 3917303.93 frames. ], batch size: 92, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:56:15,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=926590.0, ans=0.2 2024-08-11 04:56:36,967 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-11 04:56:40,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-11 04:56:42,276 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-11 04:56:45,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=926790.0, ans=0.1 2024-08-11 04:56:46,084 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 04:56:50,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=926890.0, ans=0.125 2024-08-11 04:56:50,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=926890.0, ans=0.1 2024-08-11 04:56:51,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=926890.0, ans=0.125 2024-08-11 04:57:03,975 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.796e+01 3.057e+01 3.549e+01 5.833e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 04:57:04,354 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 04:57:14,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=927090.0, ans=0.1 2024-08-11 04:57:15,667 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5750, loss[loss=0.1085, beats_loss=0.01262, ecapa_loss=0.000182, whisper_loss=0.09408, over 23617.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01143, ecapa_loss=0.000212, whisper_loss=0.09396, over 3892605.41 frames. ], batch size: 94, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:57:16,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-08-11 04:57:17,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=927090.0, ans=0.1 2024-08-11 04:57:28,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-11 04:57:30,071 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 11 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-11 04:57:47,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=927290.0, ans=0.1 2024-08-11 04:57:51,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=927290.0, ans=0.1 2024-08-11 04:57:55,685 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 04:58:11,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=927490.0, ans=0.0 2024-08-11 04:58:21,378 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5800, loss[loss=0.1022, beats_loss=0.01435, ecapa_loss=0.0001859, whisper_loss=0.08602, over 16570.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01146, ecapa_loss=0.0002113, whisper_loss=0.09381, over 3873779.92 frames. ], batch size: 68, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:58:42,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=927690.0, ans=0.125 2024-08-11 04:58:46,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-11 04:58:51,068 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 04:58:52,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=927790.0, ans=0.0 2024-08-11 04:58:52,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2024-08-11 04:59:10,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=927890.0, ans=10.0 2024-08-11 04:59:14,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.689e+01 2.933e+01 3.272e+01 5.873e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 04:59:15,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=927990.0, ans=0.125 2024-08-11 04:59:21,224 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 04:59:25,963 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5850, loss[loss=0.1114, beats_loss=0.01087, ecapa_loss=0.0001794, whisper_loss=0.09878, over 14872.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01151, ecapa_loss=0.0002109, whisper_loss=0.09375, over 3889975.91 frames. ], batch size: 57, lr: 9.07e-03, grad_scale: 2251799813685248.0 2024-08-11 04:59:33,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=928090.0, ans=0.125 2024-08-11 04:59:57,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=928290.0, ans=0.125 2024-08-11 05:00:11,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=928390.0, ans=0.125 2024-08-11 05:00:11,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=928390.0, ans=0.0 2024-08-11 05:00:11,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=928390.0, ans=0.125 2024-08-11 05:00:12,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=928390.0, ans=0.2 2024-08-11 05:00:17,980 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 19 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 05:00:23,435 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-11 05:00:26,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-08-11 05:00:30,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5900, loss[loss=0.09297, beats_loss=0.01338, ecapa_loss=0.0002112, whisper_loss=0.07748, over 20509.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0115, ecapa_loss=0.0002101, whisper_loss=0.09298, over 3851455.20 frames. ], batch size: 88, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:00:32,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=928590.0, ans=0.1 2024-08-11 05:00:38,832 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 05:00:46,678 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 05:00:50,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=928690.0, ans=0.0 2024-08-11 05:01:24,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.592e+01 2.867e+01 3.350e+01 5.876e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-11 05:01:25,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=928990.0, ans=0.0 2024-08-11 05:01:34,953 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-11 05:01:35,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=929090.0, ans=0.125 2024-08-11 05:01:36,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 5950, loss[loss=0.1111, beats_loss=0.007231, ecapa_loss=0.0002156, whisper_loss=0.1017, over 15957.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0115, ecapa_loss=0.0002099, whisper_loss=0.09285, over 3849053.09 frames. ], batch size: 62, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:01:39,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=929090.0, ans=0.125 2024-08-11 05:01:44,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=929090.0, ans=0.125 2024-08-11 05:01:57,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-11 05:02:09,059 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 05:02:13,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=929290.0, ans=0.0 2024-08-11 05:02:13,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=929290.0, ans=0.0 2024-08-11 05:02:18,290 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 05:02:22,471 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 05:02:27,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-11 05:02:34,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=929490.0, ans=0.1 2024-08-11 05:02:36,530 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 05:02:41,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6000, loss[loss=0.1161, beats_loss=0.008426, ecapa_loss=0.0002551, whisper_loss=0.1051, over 17621.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01152, ecapa_loss=0.0002107, whisper_loss=0.09259, over 3843728.56 frames. ], batch size: 72, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:02:41,662 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 05:03:21,172 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on ASR_libri: loss=0.2594, beats_loss=0, ecapa_loss=0.0006753, whisper_loss=0.2527, over 922467.00 frames. 2024-08-11 05:03:38,395 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on SV_voxceleb1: loss=0.005594, beats_loss=0, ecapa_loss=0.0005594, whisper_loss=0, over 939242.00 frames. 2024-08-11 05:05:33,477 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 05:05:33,482 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 05:05:37,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=12.0 2024-08-11 05:05:38,596 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 05:05:39,978 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-11 05:05:54,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=929690.0, ans=0.0 2024-08-11 05:05:55,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=929690.0, ans=10.0 2024-08-11 05:06:14,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=929890.0, ans=0.125 2024-08-11 05:06:23,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=929890.0, ans=0.0 2024-08-11 05:06:27,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.542e+01 2.913e+01 3.356e+01 5.863e+01, threshold=5.826e+01, percent-clipped=1.0 2024-08-11 05:06:30,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=929990.0, ans=0.0 2024-08-11 05:06:34,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2024-08-11 05:06:36,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.49 vs. limit=10.0 2024-08-11 05:06:38,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6050, loss[loss=0.09823, beats_loss=0.01327, ecapa_loss=0.0002556, whisper_loss=0.0824, over 21737.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01144, ecapa_loss=0.0002113, whisper_loss=0.0932, over 3845382.58 frames. ], batch size: 92, lr: 9.06e-03, grad_scale: 2251799813685248.0 2024-08-11 05:06:54,287 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-11 05:07:04,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=12.0 2024-08-11 05:07:04,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2024-08-11 05:07:11,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=930290.0, ans=0.1 2024-08-11 05:07:18,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=930390.0, ans=0.1 2024-08-11 05:07:19,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=930390.0, ans=0.125 2024-08-11 05:07:20,895 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 05:07:30,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=930490.0, ans=0.125 2024-08-11 05:07:30,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=930490.0, ans=0.125 2024-08-11 05:07:32,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=930490.0, ans=0.2 2024-08-11 05:07:34,669 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 05:07:39,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=930490.0, ans=0.1 2024-08-11 05:07:40,266 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 05:07:42,848 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 05:07:43,937 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6100, loss[loss=0.1184, beats_loss=0.01102, ecapa_loss=0.0002277, whisper_loss=0.1051, over 22722.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0002114, whisper_loss=0.09388, over 3894756.39 frames. ], batch size: 95, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:07:55,318 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 23 from Vox, 49 fro AS 2024-08-11 05:08:04,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=930690.0, ans=0.0 2024-08-11 05:08:12,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=930790.0, ans=0.1 2024-08-11 05:08:16,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2024-08-11 05:08:20,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=930790.0, ans=0.125 2024-08-11 05:08:23,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=930890.0, ans=0.0 2024-08-11 05:08:26,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=930890.0, ans=0.125 2024-08-11 05:08:27,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.75 vs. limit=15.0 2024-08-11 05:08:32,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=930890.0, ans=0.5 2024-08-11 05:08:37,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.611e+01 2.902e+01 3.349e+01 2.714e+02, threshold=5.803e+01, percent-clipped=1.0 2024-08-11 05:08:39,960 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 38 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 05:08:48,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2024-08-11 05:08:48,990 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6150, loss[loss=0.1171, beats_loss=0.0112, ecapa_loss=0.0002353, whisper_loss=0.1036, over 22509.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01146, ecapa_loss=0.0002108, whisper_loss=0.09396, over 3906194.34 frames. ], batch size: 93, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:09:06,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=931190.0, ans=0.0 2024-08-11 05:09:19,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=931290.0, ans=0.1 2024-08-11 05:09:54,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6200, loss[loss=0.09704, beats_loss=0.01365, ecapa_loss=0.0001995, whisper_loss=0.0814, over 22590.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01151, ecapa_loss=0.0002091, whisper_loss=0.09324, over 3881755.10 frames. ], batch size: 91, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:09:56,179 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 05:09:57,354 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 05:10:02,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=931590.0, ans=0.125 2024-08-11 05:10:08,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=931690.0, ans=0.0 2024-08-11 05:10:14,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=931690.0, ans=0.0 2024-08-11 05:10:15,622 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 05:10:32,714 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 05:10:39,264 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 05:10:48,196 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.318e+01 2.725e+01 3.050e+01 3.372e+01 5.411e+01, threshold=6.100e+01, percent-clipped=0.0 2024-08-11 05:10:52,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=931990.0, ans=0.2 2024-08-11 05:10:55,214 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 05:10:58,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=931990.0, ans=0.0 2024-08-11 05:11:00,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6250, loss[loss=0.1218, beats_loss=0.007276, ecapa_loss=0.0002639, whisper_loss=0.1118, over 16193.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01147, ecapa_loss=0.00021, whisper_loss=0.09336, over 3859551.29 frames. ], batch size: 63, lr: 9.05e-03, grad_scale: 2251799813685248.0 2024-08-11 05:11:03,219 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 05:11:06,971 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 05:11:07,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-11 05:11:11,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=22.5 2024-08-11 05:11:15,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=932190.0, ans=0.0 2024-08-11 05:11:17,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=932190.0, ans=0.0 2024-08-11 05:11:19,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=932190.0, ans=0.125 2024-08-11 05:11:26,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=932290.0, ans=0.125 2024-08-11 05:11:30,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=932290.0, ans=0.0 2024-08-11 05:11:40,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-11 05:11:58,172 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-11 05:11:59,435 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 05:12:05,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6300, loss[loss=0.1218, beats_loss=0.008507, ecapa_loss=0.0002505, whisper_loss=0.1108, over 19524.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01135, ecapa_loss=0.00021, whisper_loss=0.09454, over 3878014.87 frames. ], batch size: 76, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:12:11,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=932590.0, ans=0.125 2024-08-11 05:12:48,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=932890.0, ans=0.125 2024-08-11 05:12:59,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.671e+01 3.003e+01 3.406e+01 5.856e+01, threshold=6.007e+01, percent-clipped=0.0 2024-08-11 05:13:04,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-11 05:13:11,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6350, loss[loss=0.1136, beats_loss=0.01242, ecapa_loss=0.0002417, whisper_loss=0.09879, over 22010.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01143, ecapa_loss=0.000209, whisper_loss=0.09396, over 3849981.18 frames. ], batch size: 94, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:13:16,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=933090.0, ans=0.125 2024-08-11 05:13:54,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=933390.0, ans=0.0 2024-08-11 05:14:00,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=933390.0, ans=0.025 2024-08-11 05:14:21,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6400, loss[loss=0.1149, beats_loss=0.01113, ecapa_loss=0.0002047, whisper_loss=0.1018, over 19619.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01146, ecapa_loss=0.0002097, whisper_loss=0.09334, over 3843893.52 frames. ], batch size: 78, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:14:26,016 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 05:14:32,332 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 31 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 05:14:35,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=933690.0, ans=0.09899494936611666 2024-08-11 05:14:39,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-08-11 05:14:47,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=933790.0, ans=0.0 2024-08-11 05:14:52,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=933790.0, ans=0.125 2024-08-11 05:15:17,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=22.5 2024-08-11 05:15:17,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.766e+01 3.115e+01 3.539e+01 7.313e+01, threshold=6.229e+01, percent-clipped=3.0 2024-08-11 05:15:19,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-11 05:15:25,643 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 27 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 05:15:29,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-11 05:15:29,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6450, loss[loss=0.1024, beats_loss=0.01279, ecapa_loss=0.0001512, whisper_loss=0.08807, over 20641.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01144, ecapa_loss=0.0002088, whisper_loss=0.09373, over 3841441.87 frames. ], batch size: 81, lr: 9.04e-03, grad_scale: 2251799813685248.0 2024-08-11 05:15:47,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=934190.0, ans=0.125 2024-08-11 05:15:56,694 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-11 05:16:25,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2024-08-11 05:16:33,226 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 05:16:37,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=934490.0, ans=0.0 2024-08-11 05:16:42,892 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6500, loss[loss=0.1028, beats_loss=0.009507, ecapa_loss=0.0002671, whisper_loss=0.09063, over 15217.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01141, ecapa_loss=0.0002101, whisper_loss=0.09433, over 3853808.93 frames. ], batch size: 62, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:16:52,367 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 15 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 05:17:05,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-08-11 05:17:10,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=934690.0, ans=0.2 2024-08-11 05:17:11,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=934790.0, ans=0.0 2024-08-11 05:17:20,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=934790.0, ans=0.05 2024-08-11 05:17:29,291 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.153e+02 2024-08-11 05:17:42,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.816e+01 3.248e+01 3.661e+01 5.361e+01, threshold=6.497e+01, percent-clipped=0.0 2024-08-11 05:17:47,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=934990.0, ans=0.125 2024-08-11 05:17:55,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6550, loss[loss=0.1104, beats_loss=0.01038, ecapa_loss=0.0002539, whisper_loss=0.09749, over 22177.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01144, ecapa_loss=0.0002099, whisper_loss=0.09425, over 3884013.15 frames. ], batch size: 92, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:18:14,358 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 05:18:20,196 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 05:18:20,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=935190.0, ans=0.07 2024-08-11 05:18:21,341 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 05:18:27,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=935290.0, ans=0.0 2024-08-11 05:19:02,882 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-11 05:19:11,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6600, loss[loss=0.1141, beats_loss=0.01233, ecapa_loss=0.0001908, whisper_loss=0.09982, over 22323.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01136, ecapa_loss=0.0002126, whisper_loss=0.09435, over 3884690.98 frames. ], batch size: 90, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:19:12,681 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 05:19:15,203 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-11 05:19:20,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=935590.0, ans=0.0 2024-08-11 05:19:22,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=935590.0, ans=0.125 2024-08-11 05:19:25,858 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 05:19:36,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=935690.0, ans=0.1 2024-08-11 05:19:45,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=935790.0, ans=0.2 2024-08-11 05:19:48,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=935790.0, ans=0.125 2024-08-11 05:19:50,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=22.5 2024-08-11 05:19:53,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=935790.0, ans=0.125 2024-08-11 05:19:54,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=935890.0, ans=0.2 2024-08-11 05:19:58,868 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 05:20:11,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.309e+01 2.766e+01 3.102e+01 3.582e+01 5.637e+01, threshold=6.205e+01, percent-clipped=0.0 2024-08-11 05:20:16,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=935990.0, ans=0.125 2024-08-11 05:20:25,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6650, loss[loss=0.1012, beats_loss=0.01136, ecapa_loss=0.0002323, whisper_loss=0.08756, over 19229.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01133, ecapa_loss=0.000212, whisper_loss=0.09539, over 3914267.86 frames. ], batch size: 76, lr: 9.03e-03, grad_scale: 2251799813685248.0 2024-08-11 05:20:27,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=936090.0, ans=0.125 2024-08-11 05:20:28,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=936090.0, ans=0.2 2024-08-11 05:20:31,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=936090.0, ans=0.125 2024-08-11 05:20:33,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=936090.0, ans=0.125 2024-08-11 05:20:36,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=936090.0, ans=0.07 2024-08-11 05:20:57,857 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 05:21:01,959 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 05:21:40,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6700, loss[loss=0.1032, beats_loss=0.01108, ecapa_loss=0.0001892, whisper_loss=0.09019, over 19709.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.0114, ecapa_loss=0.0002102, whisper_loss=0.09531, over 3918201.45 frames. ], batch size: 79, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:21:42,014 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 05:22:39,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.752e+01 3.187e+01 3.868e+01 6.125e+01, threshold=6.373e+01, percent-clipped=0.0 2024-08-11 05:22:43,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=936990.0, ans=0.125 2024-08-11 05:22:52,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6750, loss[loss=0.1064, beats_loss=0.01329, ecapa_loss=0.0001913, whisper_loss=0.09121, over 21789.00 frames. ], tot_loss[loss=0.1087, beats_loss=0.01137, ecapa_loss=0.0002101, whisper_loss=0.09524, over 3901754.04 frames. ], batch size: 90, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:22:58,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.86 vs. limit=10.0 2024-08-11 05:23:04,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-11 05:23:07,960 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:23:14,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=937190.0, ans=0.1 2024-08-11 05:23:20,355 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 05:23:29,347 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 05:23:37,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=937390.0, ans=0.0 2024-08-11 05:23:37,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2024-08-11 05:23:43,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=937390.0, ans=0.0 2024-08-11 05:23:51,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=937490.0, ans=0.125 2024-08-11 05:23:53,096 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 28 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-11 05:24:06,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6800, loss[loss=0.123, beats_loss=0.008443, ecapa_loss=0.0002091, whisper_loss=0.1125, over 16138.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01138, ecapa_loss=0.0002094, whisper_loss=0.09494, over 3871718.75 frames. ], batch size: 60, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:24:09,300 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-11 05:24:16,285 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 05:24:19,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=937690.0, ans=0.125 2024-08-11 05:24:34,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=937790.0, ans=0.09899494936611666 2024-08-11 05:24:43,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937790.0, ans=0.1 2024-08-11 05:24:47,145 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 05:25:05,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.742e+01 3.088e+01 3.392e+01 5.512e+01, threshold=6.176e+01, percent-clipped=0.0 2024-08-11 05:25:14,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=937990.0, ans=0.125 2024-08-11 05:25:17,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=938090.0, ans=0.125 2024-08-11 05:25:18,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6850, loss[loss=0.1021, beats_loss=0.01076, ecapa_loss=0.0002082, whisper_loss=0.08928, over 13302.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01149, ecapa_loss=0.000209, whisper_loss=0.0935, over 3851355.17 frames. ], batch size: 54, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:25:31,563 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 05:25:42,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=938190.0, ans=0.95 2024-08-11 05:25:53,406 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:25:55,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=938290.0, ans=0.2 2024-08-11 05:26:29,852 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 05:26:30,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=938490.0, ans=0.0 2024-08-11 05:26:31,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=938490.0, ans=0.2 2024-08-11 05:26:33,291 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6900, loss[loss=0.1041, beats_loss=0.01084, ecapa_loss=0.0001894, whisper_loss=0.09137, over 20981.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01157, ecapa_loss=0.0002088, whisper_loss=0.09352, over 3873936.43 frames. ], batch size: 77, lr: 9.02e-03, grad_scale: 2251799813685248.0 2024-08-11 05:26:33,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=938590.0, ans=0.0 2024-08-11 05:26:33,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=938590.0, ans=0.2 2024-08-11 05:26:53,228 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 05:26:53,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=938690.0, ans=0.2 2024-08-11 05:27:09,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=938790.0, ans=0.0 2024-08-11 05:27:11,930 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 05:27:13,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=938790.0, ans=0.1 2024-08-11 05:27:34,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.641e+01 3.049e+01 3.440e+01 6.351e+01, threshold=6.099e+01, percent-clipped=1.0 2024-08-11 05:27:48,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 6950, loss[loss=0.1093, beats_loss=0.008676, ecapa_loss=0.0002348, whisper_loss=0.09831, over 21514.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01155, ecapa_loss=0.0002072, whisper_loss=0.0934, over 3876712.93 frames. ], batch size: 89, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:28:08,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=939190.0, ans=0.125 2024-08-11 05:28:09,691 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 05:28:28,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=939290.0, ans=0.125 2024-08-11 05:28:29,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=939290.0, ans=0.1 2024-08-11 05:28:29,412 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:28:33,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=939390.0, ans=0.0 2024-08-11 05:28:43,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=939390.0, ans=0.2 2024-08-11 05:28:48,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=939490.0, ans=0.1 2024-08-11 05:28:58,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=939490.0, ans=0.125 2024-08-11 05:29:01,207 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7000, loss[loss=0.09535, beats_loss=0.0129, ecapa_loss=0.0001487, whisper_loss=0.08096, over 14837.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01147, ecapa_loss=0.0002084, whisper_loss=0.09362, over 3875983.67 frames. ], batch size: 57, lr: 9.01e-03, grad_scale: 2251799813685248.0 2024-08-11 05:29:19,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=939690.0, ans=0.1 2024-08-11 05:29:22,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-08-11 05:29:32,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=939790.0, ans=0.125 2024-08-11 05:29:34,104 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-11 05:29:38,645 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 05:29:42,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=939790.0, ans=0.0 2024-08-11 05:29:59,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.694e+01 2.915e+01 3.195e+01 8.375e+01, threshold=5.830e+01, percent-clipped=1.0 2024-08-11 05:30:04,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-08-11 05:30:11,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7050, loss[loss=0.1032, beats_loss=0.0131, ecapa_loss=0.0001627, whisper_loss=0.08847, over 18536.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01156, ecapa_loss=0.0002083, whisper_loss=0.0929, over 3889589.50 frames. ], batch size: 70, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:30:27,104 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 05:30:35,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=940190.0, ans=0.125 2024-08-11 05:30:35,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=940190.0, ans=0.0 2024-08-11 05:30:36,627 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 05:31:19,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=940490.0, ans=0.125 2024-08-11 05:31:22,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7100, loss[loss=0.1078, beats_loss=0.01192, ecapa_loss=0.0002006, whisper_loss=0.09385, over 19502.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01155, ecapa_loss=0.0002081, whisper_loss=0.09332, over 3900677.44 frames. ], batch size: 81, lr: 9.01e-03, grad_scale: 4503599627370496.0 2024-08-11 05:31:30,329 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.225e-01 2024-08-11 05:31:31,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=940590.0, ans=0.2 2024-08-11 05:31:42,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=940690.0, ans=0.0 2024-08-11 05:31:43,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=940690.0, ans=0.125 2024-08-11 05:31:54,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=940790.0, ans=0.05 2024-08-11 05:32:17,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=940890.0, ans=0.0 2024-08-11 05:32:20,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.694e+01 2.982e+01 3.283e+01 5.309e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 05:32:24,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-11 05:32:27,206 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 05:32:30,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=940990.0, ans=0.1 2024-08-11 05:32:33,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7150, loss[loss=0.1103, beats_loss=0.01181, ecapa_loss=0.0001514, whisper_loss=0.09695, over 19140.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01165, ecapa_loss=0.000207, whisper_loss=0.09305, over 3914798.77 frames. ], batch size: 74, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:32:35,705 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-11 05:32:37,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=941090.0, ans=0.025 2024-08-11 05:32:43,445 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 31 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 05:32:54,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-11 05:32:57,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=941190.0, ans=0.2 2024-08-11 05:32:58,419 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-11 05:33:07,802 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 05:33:12,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=941290.0, ans=0.125 2024-08-11 05:33:30,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=941390.0, ans=0.125 2024-08-11 05:33:31,833 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 05:33:37,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=941490.0, ans=0.0 2024-08-11 05:33:42,985 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-11 05:33:50,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7200, loss[loss=0.124, beats_loss=0.01182, ecapa_loss=0.0002293, whisper_loss=0.1099, over 22428.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0116, ecapa_loss=0.000207, whisper_loss=0.09332, over 3950670.07 frames. ], batch size: 92, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:33:59,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=941590.0, ans=0.125 2024-08-11 05:34:04,947 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 05:34:14,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-08-11 05:34:18,706 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 05:34:30,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=941790.0, ans=0.125 2024-08-11 05:34:53,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.691e+01 3.038e+01 3.510e+01 5.388e+01, threshold=6.075e+01, percent-clipped=0.0 2024-08-11 05:35:05,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=942090.0, ans=0.125 2024-08-11 05:35:06,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7250, loss[loss=0.121, beats_loss=0.01116, ecapa_loss=0.000219, whisper_loss=0.1077, over 22242.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01152, ecapa_loss=0.0002075, whisper_loss=0.09403, over 3941641.00 frames. ], batch size: 89, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:35:10,599 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 05:35:12,128 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 05:35:32,356 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 25 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-11 05:35:42,843 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 05:35:51,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=942390.0, ans=0.07 2024-08-11 05:36:00,106 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 05:36:11,524 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 05:36:14,820 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 05:36:15,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=942490.0, ans=0.2 2024-08-11 05:36:16,221 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 05:36:22,722 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 05:36:24,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7300, loss[loss=0.1239, beats_loss=0.009496, ecapa_loss=0.0002329, whisper_loss=0.1121, over 19245.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01146, ecapa_loss=0.0002069, whisper_loss=0.09447, over 3910598.45 frames. ], batch size: 78, lr: 9.00e-03, grad_scale: 4503599627370496.0 2024-08-11 05:36:26,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=942590.0, ans=0.0 2024-08-11 05:36:30,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=942590.0, ans=0.2 2024-08-11 05:36:40,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=942690.0, ans=0.125 2024-08-11 05:36:47,805 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 05:36:50,668 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 05:36:54,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2024-08-11 05:36:55,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=942790.0, ans=0.125 2024-08-11 05:36:58,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=942790.0, ans=0.0 2024-08-11 05:37:22,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=942890.0, ans=0.0 2024-08-11 05:37:28,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.619e+01 2.865e+01 3.274e+01 5.323e+01, threshold=5.731e+01, percent-clipped=0.0 2024-08-11 05:37:42,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7350, loss[loss=0.1092, beats_loss=0.01297, ecapa_loss=0.0002263, whisper_loss=0.09394, over 16286.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01152, ecapa_loss=0.0002074, whisper_loss=0.09408, over 3904521.47 frames. ], batch size: 69, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:37:44,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=943090.0, ans=0.125 2024-08-11 05:37:46,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=943090.0, ans=0.125 2024-08-11 05:37:52,980 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 40 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 05:37:54,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2024-08-11 05:37:57,627 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 05:38:02,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=943190.0, ans=0.5 2024-08-11 05:38:27,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=943290.0, ans=0.95 2024-08-11 05:38:33,461 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 05:38:41,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=943390.0, ans=0.125 2024-08-11 05:38:47,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=943490.0, ans=0.1 2024-08-11 05:39:04,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7400, loss[loss=0.1325, beats_loss=0.009163, ecapa_loss=0.0002151, whisper_loss=0.1212, over 23298.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01151, ecapa_loss=0.0002078, whisper_loss=0.09389, over 3904403.62 frames. ], batch size: 88, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:39:08,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=12.0 2024-08-11 05:39:18,554 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:39:44,064 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 05:39:58,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.55 vs. limit=22.5 2024-08-11 05:39:59,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=943890.0, ans=0.125 2024-08-11 05:40:12,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.754e+01 3.134e+01 3.578e+01 6.308e+01, threshold=6.268e+01, percent-clipped=2.0 2024-08-11 05:40:23,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-08-11 05:40:27,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7450, loss[loss=0.1099, beats_loss=0.009453, ecapa_loss=0.0002272, whisper_loss=0.09815, over 18095.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01156, ecapa_loss=0.0002076, whisper_loss=0.09366, over 3905997.83 frames. ], batch size: 74, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:40:49,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2024-08-11 05:41:00,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=944290.0, ans=0.0 2024-08-11 05:41:14,665 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 05:41:21,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=944390.0, ans=0.1 2024-08-11 05:41:27,289 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 05:41:42,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=944490.0, ans=0.125 2024-08-11 05:41:49,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=944590.0, ans=0.125 2024-08-11 05:41:50,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7500, loss[loss=0.1112, beats_loss=0.01109, ecapa_loss=0.000176, whisper_loss=0.09832, over 22518.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0115, ecapa_loss=0.0002075, whisper_loss=0.09395, over 3907612.51 frames. ], batch size: 89, lr: 8.99e-03, grad_scale: 4503599627370496.0 2024-08-11 05:41:51,166 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 05:41:57,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=944590.0, ans=0.09899494936611666 2024-08-11 05:41:58,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=944590.0, ans=0.125 2024-08-11 05:42:23,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-11 05:42:35,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-08-11 05:42:39,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-11 05:42:40,688 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:42:41,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=944890.0, ans=0.125 2024-08-11 05:42:49,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=944890.0, ans=0.2 2024-08-11 05:42:54,214 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.583e+01 2.883e+01 3.295e+01 6.050e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 05:43:01,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=944990.0, ans=15.0 2024-08-11 05:43:07,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2024-08-11 05:43:08,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7550, loss[loss=0.1172, beats_loss=0.01164, ecapa_loss=0.0001877, whisper_loss=0.1036, over 22765.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01142, ecapa_loss=0.0002083, whisper_loss=0.09464, over 3911647.66 frames. ], batch size: 88, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:43:10,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-11 05:43:31,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=945190.0, ans=0.2 2024-08-11 05:43:37,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=945290.0, ans=0.0 2024-08-11 05:43:37,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.09 vs. limit=22.5 2024-08-11 05:43:46,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=945290.0, ans=0.0 2024-08-11 05:43:51,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=945290.0, ans=0.0 2024-08-11 05:43:58,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2024-08-11 05:43:59,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=945390.0, ans=0.2 2024-08-11 05:43:59,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=945390.0, ans=0.125 2024-08-11 05:44:25,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7600, loss[loss=0.09585, beats_loss=0.01261, ecapa_loss=0.0001824, whisper_loss=0.08141, over 21337.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01144, ecapa_loss=0.0002081, whisper_loss=0.09413, over 3913341.19 frames. ], batch size: 87, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:44:27,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-11 05:44:34,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=945590.0, ans=0.125 2024-08-11 05:44:41,524 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 05:44:42,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2024-08-11 05:44:46,889 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 05:44:51,322 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 05:45:07,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-08-11 05:45:15,345 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 05:45:21,413 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 05:45:27,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.611e+01 2.976e+01 3.513e+01 5.739e+01, threshold=5.952e+01, percent-clipped=0.0 2024-08-11 05:45:41,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7650, loss[loss=0.1097, beats_loss=0.009532, ecapa_loss=0.0002358, whisper_loss=0.09786, over 20430.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01145, ecapa_loss=0.0002074, whisper_loss=0.09423, over 3867455.41 frames. ], batch size: 82, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:45:49,266 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 05:46:00,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=946190.0, ans=0.0 2024-08-11 05:46:01,091 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.294e-02 2024-08-11 05:46:03,404 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 05:46:18,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=946290.0, ans=0.1 2024-08-11 05:46:49,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=946490.0, ans=0.125 2024-08-11 05:46:52,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=946490.0, ans=0.125 2024-08-11 05:46:52,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=946490.0, ans=0.2 2024-08-11 05:46:58,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7700, loss[loss=0.09862, beats_loss=0.01166, ecapa_loss=0.0001955, whisper_loss=0.08501, over 17511.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01135, ecapa_loss=0.0002081, whisper_loss=0.09465, over 3855926.52 frames. ], batch size: 69, lr: 8.98e-03, grad_scale: 4503599627370496.0 2024-08-11 05:47:06,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=946590.0, ans=0.125 2024-08-11 05:47:28,933 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 05:47:45,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=946890.0, ans=0.125 2024-08-11 05:47:54,022 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 05:48:02,004 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 05:48:03,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.753e+01 2.991e+01 3.515e+01 5.898e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 05:48:12,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=946990.0, ans=0.07 2024-08-11 05:48:16,876 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 05:48:17,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7750, loss[loss=0.09663, beats_loss=0.01409, ecapa_loss=0.0001804, whisper_loss=0.08073, over 18687.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01139, ecapa_loss=0.0002077, whisper_loss=0.09394, over 3867560.00 frames. ], batch size: 75, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:48:26,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-08-11 05:48:28,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=947090.0, ans=0.0 2024-08-11 05:48:34,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=947190.0, ans=0.0 2024-08-11 05:48:42,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=8.0 2024-08-11 05:48:44,895 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-11 05:48:51,077 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 05:48:54,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=947290.0, ans=0.2 2024-08-11 05:49:05,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=947390.0, ans=0.125 2024-08-11 05:49:33,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=947490.0, ans=0.125 2024-08-11 05:49:36,163 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7800, loss[loss=0.09944, beats_loss=0.01139, ecapa_loss=0.0001961, whisper_loss=0.08609, over 16710.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01141, ecapa_loss=0.0002068, whisper_loss=0.09431, over 3862024.88 frames. ], batch size: 64, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:49:46,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=947590.0, ans=0.125 2024-08-11 05:50:07,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-11 05:50:39,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.753e+01 3.128e+01 3.537e+01 5.360e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 05:50:39,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=947990.0, ans=0.125 2024-08-11 05:50:53,114 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7850, loss[loss=0.05692, beats_loss=0.01505, ecapa_loss=0.0001861, whisper_loss=0.04, over 13836.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01149, ecapa_loss=0.0002075, whisper_loss=0.09404, over 3852033.17 frames. ], batch size: 56, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:50:57,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=948090.0, ans=0.1 2024-08-11 05:50:57,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=948090.0, ans=10.0 2024-08-11 05:51:00,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=948090.0, ans=0.0 2024-08-11 05:51:13,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=948190.0, ans=0.0 2024-08-11 05:51:30,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=948290.0, ans=0.125 2024-08-11 05:51:37,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=948390.0, ans=0.0 2024-08-11 05:51:45,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=10.0 2024-08-11 05:52:09,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7900, loss[loss=0.09633, beats_loss=0.01259, ecapa_loss=0.0001829, whisper_loss=0.08191, over 19374.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01152, ecapa_loss=0.0002066, whisper_loss=0.09451, over 3904225.58 frames. ], batch size: 78, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:52:25,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=948690.0, ans=0.125 2024-08-11 05:52:38,505 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-11 05:52:48,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=948790.0, ans=0.125 2024-08-11 05:52:55,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-08-11 05:52:55,911 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 05:53:13,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=948990.0, ans=0.125 2024-08-11 05:53:14,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.621e+01 3.000e+01 3.506e+01 5.251e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-11 05:53:16,196 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 05:53:29,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 7950, loss[loss=0.09814, beats_loss=0.008862, ecapa_loss=0.0002041, whisper_loss=0.08724, over 17205.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01146, ecapa_loss=0.0002075, whisper_loss=0.09446, over 3931632.52 frames. ], batch size: 66, lr: 8.97e-03, grad_scale: 4503599627370496.0 2024-08-11 05:53:30,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=949090.0, ans=0.0 2024-08-11 05:53:33,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=949090.0, ans=0.035 2024-08-11 05:53:48,613 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 05:53:50,359 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 05:53:56,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=949190.0, ans=0.0 2024-08-11 05:54:06,143 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 05:54:08,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=949290.0, ans=0.125 2024-08-11 05:54:08,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-08-11 05:54:10,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=949290.0, ans=0.04949747468305833 2024-08-11 05:54:24,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=949390.0, ans=0.0 2024-08-11 05:54:28,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=949390.0, ans=0.2 2024-08-11 05:54:36,489 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 11 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 05:54:38,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=949490.0, ans=0.2 2024-08-11 05:54:45,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=949490.0, ans=0.0 2024-08-11 05:54:50,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8000, loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0002086, whisper_loss=0.09082, over 15322.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01152, ecapa_loss=0.0002057, whisper_loss=0.09353, over 3908164.81 frames. ], batch size: 56, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:54:58,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=949590.0, ans=0.0 2024-08-11 05:55:07,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=22.5 2024-08-11 05:55:16,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=949690.0, ans=0.0 2024-08-11 05:55:28,034 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 05:55:33,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.24 vs. limit=15.0 2024-08-11 05:55:41,365 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 05:55:58,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.705e+01 3.037e+01 3.592e+01 7.289e+01, threshold=6.074e+01, percent-clipped=2.0 2024-08-11 05:56:10,802 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8050, loss[loss=0.1153, beats_loss=0.009006, ecapa_loss=0.0002714, whisper_loss=0.1036, over 13551.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0114, ecapa_loss=0.0002063, whisper_loss=0.09424, over 3869343.39 frames. ], batch size: 54, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:56:20,340 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 05:56:23,686 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 05:56:27,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=950190.0, ans=0.0 2024-08-11 05:56:35,561 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 05:56:35,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=950190.0, ans=0.0 2024-08-11 05:56:35,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=950190.0, ans=0.125 2024-08-11 05:57:00,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-11 05:57:08,670 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 05:57:08,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=950390.0, ans=0.2 2024-08-11 05:57:20,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=950490.0, ans=0.125 2024-08-11 05:57:22,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=950490.0, ans=0.125 2024-08-11 05:57:28,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8100, loss[loss=0.1064, beats_loss=0.01101, ecapa_loss=0.0001844, whisper_loss=0.09358, over 17303.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01148, ecapa_loss=0.0002055, whisper_loss=0.09347, over 3868304.39 frames. ], batch size: 69, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:57:42,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=950690.0, ans=0.09899494936611666 2024-08-11 05:58:03,639 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 05:58:07,277 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 05:58:23,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=950890.0, ans=0.125 2024-08-11 05:58:36,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.727e+01 3.067e+01 3.354e+01 4.801e+01, threshold=6.134e+01, percent-clipped=0.0 2024-08-11 05:58:38,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-11 05:58:51,418 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8150, loss[loss=0.1027, beats_loss=0.01127, ecapa_loss=0.0001612, whisper_loss=0.08987, over 19235.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01145, ecapa_loss=0.0002058, whisper_loss=0.09337, over 3879093.05 frames. ], batch size: 73, lr: 8.96e-03, grad_scale: 4503599627370496.0 2024-08-11 05:59:33,923 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 05:59:37,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=951290.0, ans=0.2 2024-08-11 05:59:45,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2024-08-11 05:59:51,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=951390.0, ans=0.125 2024-08-11 06:00:00,596 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-11 06:00:13,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8200, loss[loss=0.115, beats_loss=0.01129, ecapa_loss=0.0001846, whisper_loss=0.1018, over 23903.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01146, ecapa_loss=0.0002072, whisper_loss=0.09298, over 3902198.84 frames. ], batch size: 94, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:00:15,996 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 06:00:24,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=951590.0, ans=0.1 2024-08-11 06:00:28,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=951690.0, ans=0.125 2024-08-11 06:00:31,110 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 06:00:31,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2024-08-11 06:00:51,077 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 06:01:07,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=951890.0, ans=0.125 2024-08-11 06:01:14,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2024-08-11 06:01:16,445 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 06:01:19,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=951990.0, ans=0.0 2024-08-11 06:01:19,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.662e+01 3.047e+01 3.528e+01 2.595e+02, threshold=6.093e+01, percent-clipped=1.0 2024-08-11 06:01:34,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8250, loss[loss=0.1091, beats_loss=0.01184, ecapa_loss=0.0001848, whisper_loss=0.09545, over 19814.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01147, ecapa_loss=0.0002067, whisper_loss=0.09356, over 3905532.74 frames. ], batch size: 80, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:01:34,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=952090.0, ans=0.07 2024-08-11 06:01:58,157 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 06:02:04,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=952190.0, ans=0.0 2024-08-11 06:02:06,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=952290.0, ans=0.125 2024-08-11 06:02:07,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=952290.0, ans=0.125 2024-08-11 06:02:25,037 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 06:02:34,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=952390.0, ans=0.2 2024-08-11 06:02:54,264 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8300, loss[loss=0.08451, beats_loss=0.01311, ecapa_loss=0.0002458, whisper_loss=0.06894, over 19225.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01148, ecapa_loss=0.0002059, whisper_loss=0.09355, over 3919351.29 frames. ], batch size: 84, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:02:56,072 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 06:03:12,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-08-11 06:03:22,886 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 06:03:36,633 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 06:03:36,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=952790.0, ans=0.125 2024-08-11 06:03:41,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=952890.0, ans=0.0 2024-08-11 06:03:58,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.259e+01 2.727e+01 2.981e+01 3.576e+01 6.756e+01, threshold=5.962e+01, percent-clipped=1.0 2024-08-11 06:04:12,460 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8350, loss[loss=0.08912, beats_loss=0.0145, ecapa_loss=0.0001488, whisper_loss=0.07314, over 22601.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01146, ecapa_loss=0.0002066, whisper_loss=0.09362, over 3893074.10 frames. ], batch size: 92, lr: 8.95e-03, grad_scale: 4503599627370496.0 2024-08-11 06:04:12,583 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 06:04:19,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=953090.0, ans=0.0 2024-08-11 06:04:26,626 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 06:04:44,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=22.5 2024-08-11 06:05:03,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=12.0 2024-08-11 06:05:06,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-11 06:05:08,574 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 06:05:22,589 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-11 06:05:33,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8400, loss[loss=0.1018, beats_loss=0.01375, ecapa_loss=0.0001853, whisper_loss=0.08617, over 17027.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01144, ecapa_loss=0.0002068, whisper_loss=0.09404, over 3893417.50 frames. ], batch size: 68, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:06:09,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=953790.0, ans=0.0 2024-08-11 06:06:28,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=953890.0, ans=0.1 2024-08-11 06:06:30,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=953890.0, ans=0.1 2024-08-11 06:06:31,437 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 06:06:40,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.819e+01 3.267e+01 3.747e+01 3.320e+02, threshold=6.533e+01, percent-clipped=4.0 2024-08-11 06:06:42,011 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 06:06:43,801 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 06:06:54,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8450, loss[loss=0.1113, beats_loss=0.009796, ecapa_loss=0.0001886, whisper_loss=0.09964, over 22281.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01134, ecapa_loss=0.0002088, whisper_loss=0.09505, over 3907911.66 frames. ], batch size: 84, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:07:03,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=954090.0, ans=0.05 2024-08-11 06:07:18,635 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-11 06:07:23,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-11 06:07:33,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=954290.0, ans=0.0 2024-08-11 06:07:42,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=954390.0, ans=0.125 2024-08-11 06:07:43,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=954390.0, ans=0.0 2024-08-11 06:07:53,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=954390.0, ans=0.125 2024-08-11 06:08:01,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=954490.0, ans=0.125 2024-08-11 06:08:05,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=954490.0, ans=0.0 2024-08-11 06:08:07,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=954490.0, ans=0.1 2024-08-11 06:08:07,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=954490.0, ans=0.125 2024-08-11 06:08:17,990 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8500, loss[loss=0.09798, beats_loss=0.01204, ecapa_loss=0.0002047, whisper_loss=0.0839, over 21385.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01137, ecapa_loss=0.0002074, whisper_loss=0.0944, over 3907378.48 frames. ], batch size: 88, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:08:52,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=954790.0, ans=0.1 2024-08-11 06:08:54,137 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 06:09:04,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2024-08-11 06:09:05,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2024-08-11 06:09:19,304 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 06:09:23,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=954990.0, ans=0.1 2024-08-11 06:09:24,143 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 06:09:24,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=954990.0, ans=0.125 2024-08-11 06:09:25,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.679e+01 3.057e+01 3.369e+01 5.558e+01, threshold=6.114e+01, percent-clipped=0.0 2024-08-11 06:09:39,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8550, loss[loss=0.1067, beats_loss=0.01111, ecapa_loss=0.000166, whisper_loss=0.09391, over 21087.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01137, ecapa_loss=0.000207, whisper_loss=0.09423, over 3923581.62 frames. ], batch size: 81, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:09:41,513 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 06:09:46,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=955090.0, ans=0.125 2024-08-11 06:09:53,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-11 06:10:15,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=955290.0, ans=0.05 2024-08-11 06:10:20,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=955290.0, ans=0.0 2024-08-11 06:10:26,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=955290.0, ans=0.125 2024-08-11 06:10:41,865 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 06:10:50,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=955490.0, ans=0.125 2024-08-11 06:11:00,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=955490.0, ans=0.125 2024-08-11 06:11:05,362 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8600, loss[loss=0.08966, beats_loss=0.00873, ecapa_loss=0.0001602, whisper_loss=0.07932, over 16987.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01139, ecapa_loss=0.0002056, whisper_loss=0.09443, over 3911025.76 frames. ], batch size: 62, lr: 8.94e-03, grad_scale: 4503599627370496.0 2024-08-11 06:11:20,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=955690.0, ans=0.125 2024-08-11 06:11:51,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=955790.0, ans=0.125 2024-08-11 06:11:59,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2024-08-11 06:12:01,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=955890.0, ans=0.0 2024-08-11 06:12:08,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.97 vs. limit=22.5 2024-08-11 06:12:08,620 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 10 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 06:12:14,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.782e+01 3.171e+01 3.818e+01 6.085e+01, threshold=6.342e+01, percent-clipped=0.0 2024-08-11 06:12:26,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=955990.0, ans=0.125 2024-08-11 06:12:28,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8650, loss[loss=0.1029, beats_loss=0.01118, ecapa_loss=0.0002321, whisper_loss=0.08935, over 13510.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01136, ecapa_loss=0.0002062, whisper_loss=0.09452, over 3927270.76 frames. ], batch size: 56, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:12:34,163 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 06:12:35,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=956090.0, ans=0.125 2024-08-11 06:12:37,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=956090.0, ans=0.1 2024-08-11 06:12:39,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=956090.0, ans=0.125 2024-08-11 06:13:22,306 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-11 06:13:52,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8700, loss[loss=0.1202, beats_loss=0.008855, ecapa_loss=0.0002188, whisper_loss=0.1092, over 15024.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01133, ecapa_loss=0.000208, whisper_loss=0.09403, over 3891203.44 frames. ], batch size: 57, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:14:04,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=956590.0, ans=0.0 2024-08-11 06:14:06,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=956690.0, ans=0.0 2024-08-11 06:14:10,128 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 06:14:12,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2024-08-11 06:14:20,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2024-08-11 06:14:23,116 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 06:14:32,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=956790.0, ans=0.0 2024-08-11 06:14:43,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=956890.0, ans=0.0 2024-08-11 06:14:53,305 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 06:14:57,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.743e+01 3.051e+01 3.561e+01 4.836e+01, threshold=6.102e+01, percent-clipped=0.0 2024-08-11 06:15:07,051 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-11 06:15:08,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=956990.0, ans=0.1 2024-08-11 06:15:11,978 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8750, loss[loss=0.1132, beats_loss=0.01058, ecapa_loss=0.0002594, whisper_loss=0.09998, over 21608.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01134, ecapa_loss=0.0002082, whisper_loss=0.09401, over 3864246.84 frames. ], batch size: 88, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:15:15,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=957090.0, ans=0.125 2024-08-11 06:15:26,481 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 16 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-11 06:15:27,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2024-08-11 06:15:29,724 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 06:16:04,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=957390.0, ans=0.125 2024-08-11 06:16:15,304 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 06:16:15,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-08-11 06:16:23,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=957490.0, ans=0.125 2024-08-11 06:16:23,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=957490.0, ans=0.0 2024-08-11 06:16:27,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=957490.0, ans=0.125 2024-08-11 06:16:28,459 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 06:16:29,528 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8800, loss[loss=0.1048, beats_loss=0.01143, ecapa_loss=0.0002099, whisper_loss=0.09126, over 22589.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01143, ecapa_loss=0.0002077, whisper_loss=0.09331, over 3878496.23 frames. ], batch size: 92, lr: 8.93e-03, grad_scale: 4503599627370496.0 2024-08-11 06:16:34,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-08-11 06:16:38,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=957590.0, ans=0.2 2024-08-11 06:16:41,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=957590.0, ans=0.0 2024-08-11 06:16:43,505 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 06:17:08,156 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 06:17:32,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-11 06:17:33,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.553e+01 2.761e+01 3.256e+01 4.911e+01, threshold=5.522e+01, percent-clipped=0.0 2024-08-11 06:17:49,409 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8850, loss[loss=0.111, beats_loss=0.01108, ecapa_loss=0.0001985, whisper_loss=0.0979, over 21526.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01145, ecapa_loss=0.0002068, whisper_loss=0.09282, over 3859467.08 frames. ], batch size: 84, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:17:56,552 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 06:18:20,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-11 06:18:26,719 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 06:18:32,801 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.736e-03 2024-08-11 06:19:10,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8900, loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0002689, whisper_loss=0.08817, over 19400.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01152, ecapa_loss=0.0002062, whisper_loss=0.09263, over 3845821.53 frames. ], batch size: 84, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:19:14,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-11 06:19:28,945 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-11 06:19:30,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=958690.0, ans=0.125 2024-08-11 06:19:31,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=958690.0, ans=0.125 2024-08-11 06:19:38,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=958790.0, ans=0.125 2024-08-11 06:20:09,929 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 06:20:13,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.703e+01 3.133e+01 3.628e+01 5.499e+01, threshold=6.267e+01, percent-clipped=0.0 2024-08-11 06:20:26,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 8950, loss[loss=0.08432, beats_loss=0.01546, ecapa_loss=0.0001674, whisper_loss=0.06719, over 19909.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0116, ecapa_loss=0.0002059, whisper_loss=0.09277, over 3866615.16 frames. ], batch size: 82, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:20:28,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=959090.0, ans=0.1 2024-08-11 06:20:35,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-08-11 06:20:40,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-11 06:20:41,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=959190.0, ans=0.125 2024-08-11 06:20:44,984 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-11 06:20:49,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=959190.0, ans=0.125 2024-08-11 06:21:16,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=959390.0, ans=0.0 2024-08-11 06:21:40,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9000, loss[loss=0.09815, beats_loss=0.009583, ecapa_loss=0.0002179, whisper_loss=0.08639, over 18324.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01159, ecapa_loss=0.0002081, whisper_loss=0.09271, over 3871818.63 frames. ], batch size: 70, lr: 8.92e-03, grad_scale: 4503599627370496.0 2024-08-11 06:21:40,489 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 06:21:54,248 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6903, 5.0186, 4.9105, 3.7564], device='cuda:0') 2024-08-11 06:22:22,401 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on ASR_libri: loss=0.2572, beats_loss=0, ecapa_loss=0.0006695, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 06:22:40,950 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on SV_voxceleb1: loss=0.005671, beats_loss=0, ecapa_loss=0.0005671, whisper_loss=0, over 939242.00 frames. 2024-08-11 06:23:41,070 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8783, 2.0914, 1.8437, 1.5561, 1.3848, 1.3202, 1.6940, 1.8403], device='cuda:0') 2024-08-11 06:24:43,624 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on AT_audioset: loss=0.0256, beats_loss=0.0256, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 06:24:43,629 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 06:24:46,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=959590.0, ans=0.0 2024-08-11 06:24:56,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=959590.0, ans=0.125 2024-08-11 06:25:08,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.67 vs. limit=10.0 2024-08-11 06:25:12,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=959790.0, ans=0.5 2024-08-11 06:25:13,805 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 06:25:20,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=959790.0, ans=0.125 2024-08-11 06:25:29,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=959890.0, ans=0.1 2024-08-11 06:25:33,431 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 06:25:38,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=959890.0, ans=0.125 2024-08-11 06:25:38,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=959890.0, ans=0.125 2024-08-11 06:25:45,493 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-96000.pt 2024-08-11 06:25:49,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.675e+01 2.932e+01 3.308e+01 5.321e+01, threshold=5.865e+01, percent-clipped=0.0 2024-08-11 06:26:00,885 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 06:26:03,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9050, loss[loss=0.1206, beats_loss=0.01116, ecapa_loss=0.0002558, whisper_loss=0.1068, over 16794.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01154, ecapa_loss=0.0002076, whisper_loss=0.09304, over 3858460.93 frames. ], batch size: 70, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:26:03,857 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 06:26:15,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=960090.0, ans=0.125 2024-08-11 06:26:30,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=960190.0, ans=0.125 2024-08-11 06:26:39,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=960190.0, ans=0.0 2024-08-11 06:26:42,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=960290.0, ans=0.125 2024-08-11 06:26:46,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=960290.0, ans=0.0 2024-08-11 06:26:52,902 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 06:27:02,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=960390.0, ans=0.125 2024-08-11 06:27:17,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=960490.0, ans=0.1 2024-08-11 06:27:32,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9100, loss[loss=0.1156, beats_loss=0.01132, ecapa_loss=0.000208, whisper_loss=0.1023, over 22173.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01145, ecapa_loss=0.000208, whisper_loss=0.09367, over 3847690.50 frames. ], batch size: 91, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:27:37,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=960590.0, ans=0.0 2024-08-11 06:27:55,013 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 06:28:12,336 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 06:28:16,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=960790.0, ans=0.0 2024-08-11 06:28:51,507 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 06:28:51,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=960990.0, ans=0.0 2024-08-11 06:28:52,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.825e+01 3.107e+01 3.810e+01 5.498e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 06:28:53,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=960990.0, ans=0.015 2024-08-11 06:29:05,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=960990.0, ans=0.125 2024-08-11 06:29:10,684 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9150, loss[loss=0.09316, beats_loss=0.01343, ecapa_loss=0.0002153, whisper_loss=0.07758, over 21331.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01145, ecapa_loss=0.0002092, whisper_loss=0.09375, over 3876269.64 frames. ], batch size: 90, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:29:27,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2024-08-11 06:29:31,792 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 06:30:21,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=961390.0, ans=0.2 2024-08-11 06:30:26,196 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 06:30:36,753 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 06:30:43,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9200, loss[loss=0.1089, beats_loss=0.01315, ecapa_loss=0.000214, whisper_loss=0.09359, over 21934.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01145, ecapa_loss=0.0002091, whisper_loss=0.09375, over 3897664.42 frames. ], batch size: 89, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:30:59,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=961590.0, ans=0.125 2024-08-11 06:31:19,146 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 06:31:27,267 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-11 06:31:55,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=961890.0, ans=0.09899494936611666 2024-08-11 06:32:00,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=15.0 2024-08-11 06:32:06,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.686e+01 3.168e+01 3.590e+01 6.490e+01, threshold=6.336e+01, percent-clipped=1.0 2024-08-11 06:32:26,100 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9250, loss[loss=0.1361, beats_loss=0.00798, ecapa_loss=0.0002425, whisper_loss=0.1257, over 21795.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01141, ecapa_loss=0.0002083, whisper_loss=0.09425, over 3881206.34 frames. ], batch size: 84, lr: 8.91e-03, grad_scale: 9007199254740992.0 2024-08-11 06:32:40,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=962090.0, ans=0.125 2024-08-11 06:32:43,259 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-11 06:32:50,145 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 06:32:50,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2024-08-11 06:33:11,424 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 06:33:15,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=12.0 2024-08-11 06:33:16,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=962290.0, ans=0.125 2024-08-11 06:33:26,558 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 06:33:40,376 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 06:33:49,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9300, loss[loss=0.1209, beats_loss=0.01117, ecapa_loss=0.0001959, whisper_loss=0.1078, over 22858.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01143, ecapa_loss=0.000206, whisper_loss=0.09434, over 3922132.68 frames. ], batch size: 91, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:34:05,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=962690.0, ans=0.0 2024-08-11 06:34:06,901 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 06:34:17,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=962790.0, ans=0.125 2024-08-11 06:34:36,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=962890.0, ans=0.5 2024-08-11 06:34:38,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=962890.0, ans=0.125 2024-08-11 06:34:47,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=962990.0, ans=0.125 2024-08-11 06:34:50,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.791e+01 3.053e+01 3.524e+01 6.115e+01, threshold=6.107e+01, percent-clipped=0.0 2024-08-11 06:35:03,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9350, loss[loss=0.1067, beats_loss=0.01372, ecapa_loss=0.0002377, whisper_loss=0.09063, over 21664.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01138, ecapa_loss=0.0002085, whisper_loss=0.09436, over 3895190.35 frames. ], batch size: 91, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:35:18,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.19 vs. limit=6.0 2024-08-11 06:35:31,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=963190.0, ans=0.1 2024-08-11 06:35:43,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=963290.0, ans=0.2 2024-08-11 06:35:47,799 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 06:35:56,587 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 06:35:59,474 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 06:36:06,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-08-11 06:36:17,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=963590.0, ans=0.125 2024-08-11 06:36:17,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9400, loss[loss=0.1247, beats_loss=0.01023, ecapa_loss=0.0002355, whisper_loss=0.1121, over 18224.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01138, ecapa_loss=0.0002077, whisper_loss=0.09459, over 3869953.02 frames. ], batch size: 70, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:36:22,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=963590.0, ans=0.125 2024-08-11 06:36:22,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=963590.0, ans=0.125 2024-08-11 06:36:44,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963690.0, ans=0.1 2024-08-11 06:36:46,686 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 06:36:48,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=963790.0, ans=0.0 2024-08-11 06:36:59,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=12.0 2024-08-11 06:37:03,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=963890.0, ans=0.0 2024-08-11 06:37:18,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=963990.0, ans=0.1 2024-08-11 06:37:18,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.687e+01 3.013e+01 3.513e+01 7.296e+01, threshold=6.026e+01, percent-clipped=1.0 2024-08-11 06:37:25,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=963990.0, ans=0.125 2024-08-11 06:37:28,567 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 06:37:32,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9450, loss[loss=0.1216, beats_loss=0.01049, ecapa_loss=0.0001807, whisper_loss=0.1093, over 23021.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01142, ecapa_loss=0.0002074, whisper_loss=0.0943, over 3885240.27 frames. ], batch size: 88, lr: 8.90e-03, grad_scale: 9007199254740992.0 2024-08-11 06:37:32,915 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 06:37:33,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2024-08-11 06:37:36,878 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 06:37:40,522 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 06:38:05,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=964290.0, ans=0.2 2024-08-11 06:38:07,975 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 06:38:11,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=964290.0, ans=0.0 2024-08-11 06:38:37,441 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 29 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-11 06:38:42,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=964490.0, ans=0.1 2024-08-11 06:38:48,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9500, loss[loss=0.07773, beats_loss=0.01321, ecapa_loss=0.0001958, whisper_loss=0.06257, over 22990.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01148, ecapa_loss=0.0002069, whisper_loss=0.09358, over 3886921.09 frames. ], batch size: 95, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:38:55,285 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 06:39:05,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=964690.0, ans=0.0 2024-08-11 06:39:16,950 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 06:39:20,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=964790.0, ans=0.0 2024-08-11 06:39:24,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=964790.0, ans=0.125 2024-08-11 06:39:30,735 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 06:39:35,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=964890.0, ans=0.0 2024-08-11 06:39:38,042 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-11 06:39:45,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.83 vs. limit=22.5 2024-08-11 06:39:50,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.745e+01 3.159e+01 3.801e+01 1.108e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 06:39:58,086 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 06:40:03,716 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9550, loss[loss=0.1179, beats_loss=0.01046, ecapa_loss=0.0002022, whisper_loss=0.1054, over 15684.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01146, ecapa_loss=0.0002091, whisper_loss=0.09335, over 3856627.61 frames. ], batch size: 59, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:40:18,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=965190.0, ans=0.0 2024-08-11 06:40:30,983 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 06:40:39,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965290.0, ans=0.1 2024-08-11 06:40:50,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=965390.0, ans=0.0 2024-08-11 06:40:53,831 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-11 06:40:54,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=965390.0, ans=0.125 2024-08-11 06:41:07,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2024-08-11 06:41:12,607 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 06:41:13,995 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9600, loss[loss=0.1062, beats_loss=0.01102, ecapa_loss=0.0002137, whisper_loss=0.09301, over 22752.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01138, ecapa_loss=0.0002087, whisper_loss=0.09379, over 3831419.68 frames. ], batch size: 92, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:41:17,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=965590.0, ans=0.2 2024-08-11 06:41:31,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2024-08-11 06:41:37,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=965690.0, ans=0.125 2024-08-11 06:41:48,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=965790.0, ans=0.0 2024-08-11 06:41:51,961 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 06:41:55,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=965790.0, ans=0.05 2024-08-11 06:41:59,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=965890.0, ans=0.07 2024-08-11 06:42:07,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2024-08-11 06:42:14,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.765e+01 3.049e+01 3.383e+01 4.788e+01, threshold=6.099e+01, percent-clipped=0.0 2024-08-11 06:42:28,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9650, loss[loss=0.08936, beats_loss=0.01106, ecapa_loss=0.0002217, whisper_loss=0.07609, over 17293.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01134, ecapa_loss=0.0002096, whisper_loss=0.09381, over 3846343.34 frames. ], batch size: 71, lr: 8.89e-03, grad_scale: 9007199254740992.0 2024-08-11 06:43:21,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2024-08-11 06:43:27,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=966490.0, ans=0.2 2024-08-11 06:43:39,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-11 06:43:40,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=966490.0, ans=0.125 2024-08-11 06:43:43,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9700, loss[loss=0.09102, beats_loss=0.01259, ecapa_loss=0.00024, whisper_loss=0.07603, over 19941.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01131, ecapa_loss=0.0002098, whisper_loss=0.09356, over 3834380.22 frames. ], batch size: 83, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:44:08,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=966690.0, ans=0.2 2024-08-11 06:44:12,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=966790.0, ans=0.125 2024-08-11 06:44:13,945 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 06:44:35,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=966890.0, ans=0.0 2024-08-11 06:44:42,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.608e+01 2.892e+01 3.245e+01 5.119e+01, threshold=5.784e+01, percent-clipped=0.0 2024-08-11 06:44:44,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=966990.0, ans=0.0 2024-08-11 06:44:55,496 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9750, loss[loss=0.09464, beats_loss=0.01238, ecapa_loss=0.0002083, whisper_loss=0.08018, over 21606.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01136, ecapa_loss=0.0002081, whisper_loss=0.09334, over 3833434.39 frames. ], batch size: 89, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:45:04,209 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 06:45:15,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2024-08-11 06:45:31,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-11 06:45:44,597 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-11 06:45:47,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=967390.0, ans=0.2 2024-08-11 06:45:57,653 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 06:46:06,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=967590.0, ans=0.0 2024-08-11 06:46:07,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9800, loss[loss=0.1035, beats_loss=0.01244, ecapa_loss=0.0001689, whisper_loss=0.08936, over 20996.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01138, ecapa_loss=0.0002085, whisper_loss=0.09278, over 3829400.33 frames. ], batch size: 81, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:46:23,646 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 06:46:37,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=967790.0, ans=0.0 2024-08-11 06:46:56,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.27 vs. limit=10.0 2024-08-11 06:47:06,620 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.643e+01 2.929e+01 3.455e+01 6.415e+01, threshold=5.858e+01, percent-clipped=3.0 2024-08-11 06:47:08,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-11 06:47:09,778 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 06:47:19,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9850, loss[loss=0.1194, beats_loss=0.01298, ecapa_loss=0.0001745, whisper_loss=0.1047, over 22817.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01141, ecapa_loss=0.0002075, whisper_loss=0.09399, over 3877988.97 frames. ], batch size: 89, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:47:20,003 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 06:47:29,743 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 06:47:42,820 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 06:48:01,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=968290.0, ans=0.0 2024-08-11 06:48:11,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=968390.0, ans=0.125 2024-08-11 06:48:12,928 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-11 06:48:28,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=968490.0, ans=0.125 2024-08-11 06:48:31,915 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 06:48:33,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=968590.0, ans=0.0 2024-08-11 06:48:34,797 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9900, loss[loss=0.09483, beats_loss=0.01417, ecapa_loss=0.0002023, whisper_loss=0.07863, over 22441.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01153, ecapa_loss=0.0002056, whisper_loss=0.09381, over 3897410.20 frames. ], batch size: 93, lr: 8.88e-03, grad_scale: 9007199254740992.0 2024-08-11 06:48:46,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=968590.0, ans=0.125 2024-08-11 06:49:12,815 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 06:49:22,053 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 06:49:32,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.797e+01 3.066e+01 3.610e+01 6.025e+01, threshold=6.133e+01, percent-clipped=2.0 2024-08-11 06:49:39,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=968990.0, ans=0.0 2024-08-11 06:49:45,265 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 9950, loss[loss=0.08397, beats_loss=0.01193, ecapa_loss=0.0002492, whisper_loss=0.06954, over 19249.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01154, ecapa_loss=0.0002055, whisper_loss=0.09324, over 3883374.98 frames. ], batch size: 82, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:49:45,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=969090.0, ans=0.0 2024-08-11 06:49:55,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=969090.0, ans=0.0 2024-08-11 06:50:05,038 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 18 from LS+wenet, 33 from Vox, 45 fro AS 2024-08-11 06:50:05,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-11 06:50:07,658 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 06:50:07,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=969190.0, ans=0.125 2024-08-11 06:50:11,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-11 06:50:14,447 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 06:50:20,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=969290.0, ans=0.0 2024-08-11 06:50:23,615 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 41 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 06:50:27,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=969390.0, ans=0.0 2024-08-11 06:50:38,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=969390.0, ans=0.1 2024-08-11 06:50:41,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=969390.0, ans=0.125 2024-08-11 06:50:56,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-11 06:50:57,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=969590.0, ans=0.2 2024-08-11 06:50:58,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10000, loss[loss=0.08941, beats_loss=0.0124, ecapa_loss=0.0002051, whisper_loss=0.07495, over 19140.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01163, ecapa_loss=0.0002051, whisper_loss=0.09246, over 3878351.24 frames. ], batch size: 78, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:50:58,506 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 06:51:16,656 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 06:51:20,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2024-08-11 06:51:22,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2024-08-11 06:51:27,219 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 06:51:36,660 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 06:51:44,052 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-11 06:51:44,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=969890.0, ans=0.125 2024-08-11 06:51:56,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.627e+01 2.974e+01 3.477e+01 5.733e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-11 06:52:09,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10050, loss[loss=0.09152, beats_loss=0.01321, ecapa_loss=0.0002069, whisper_loss=0.07623, over 22060.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01148, ecapa_loss=0.0002047, whisper_loss=0.09371, over 3893536.43 frames. ], batch size: 93, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:52:35,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=970190.0, ans=0.125 2024-08-11 06:52:49,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=970290.0, ans=0.0 2024-08-11 06:52:54,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-08-11 06:52:58,671 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 06:53:07,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2024-08-11 06:53:18,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10100, loss[loss=0.1281, beats_loss=0.008434, ecapa_loss=0.000199, whisper_loss=0.1177, over 16877.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01158, ecapa_loss=0.0002057, whisper_loss=0.09297, over 3910458.20 frames. ], batch size: 62, lr: 8.87e-03, grad_scale: 9007199254740992.0 2024-08-11 06:53:21,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-08-11 06:53:28,719 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 06:53:40,351 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 38 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 06:53:51,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=970790.0, ans=0.2 2024-08-11 06:53:52,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=970790.0, ans=0.125 2024-08-11 06:53:55,780 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 06:54:02,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=970890.0, ans=12.0 2024-08-11 06:54:11,571 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.833e+01 3.189e+01 3.704e+01 6.701e+01, threshold=6.379e+01, percent-clipped=2.0 2024-08-11 06:54:11,712 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 06:54:12,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=970990.0, ans=0.125 2024-08-11 06:54:13,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2024-08-11 06:54:19,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.56 vs. limit=22.5 2024-08-11 06:54:23,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10150, loss[loss=0.09342, beats_loss=0.0119, ecapa_loss=0.0001857, whisper_loss=0.07967, over 18042.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01151, ecapa_loss=0.0002081, whisper_loss=0.09295, over 3925711.88 frames. ], batch size: 73, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:54:38,955 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 06:54:40,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-11 06:54:54,761 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 06:54:57,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=971290.0, ans=0.125 2024-08-11 06:55:22,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=971490.0, ans=0.04949747468305833 2024-08-11 06:55:23,901 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 06:55:28,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10200, loss[loss=0.1102, beats_loss=0.009384, ecapa_loss=0.0002319, whisper_loss=0.09852, over 21945.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01149, ecapa_loss=0.0002083, whisper_loss=0.09305, over 3893979.65 frames. ], batch size: 89, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:55:31,522 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 06:55:33,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=971590.0, ans=0.2 2024-08-11 06:55:44,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-11 06:55:46,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=971690.0, ans=0.125 2024-08-11 06:55:58,941 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 06:56:00,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=971790.0, ans=0.04949747468305833 2024-08-11 06:56:12,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.11 vs. limit=6.0 2024-08-11 06:56:14,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-11 06:56:18,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=12.0 2024-08-11 06:56:19,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=971990.0, ans=0.0 2024-08-11 06:56:22,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.593e+01 3.063e+01 3.580e+01 1.842e+02, threshold=6.125e+01, percent-clipped=1.0 2024-08-11 06:56:22,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=971990.0, ans=0.125 2024-08-11 06:56:25,889 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 06:56:28,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=971990.0, ans=0.0 2024-08-11 06:56:31,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=971990.0, ans=10.0 2024-08-11 06:56:33,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10250, loss[loss=0.1232, beats_loss=0.01076, ecapa_loss=0.000241, whisper_loss=0.11, over 21859.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.0114, ecapa_loss=0.0002088, whisper_loss=0.09433, over 3929170.85 frames. ], batch size: 87, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:56:39,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=972090.0, ans=0.2 2024-08-11 06:56:49,700 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 06:56:50,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2024-08-11 06:57:16,602 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 06:57:21,971 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-11 06:57:33,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972490.0, ans=0.1 2024-08-11 06:57:36,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=972490.0, ans=0.0 2024-08-11 06:57:38,645 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10300, loss[loss=0.1138, beats_loss=0.0117, ecapa_loss=0.0001806, whisper_loss=0.1003, over 22797.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01138, ecapa_loss=0.0002072, whisper_loss=0.09417, over 3912926.04 frames. ], batch size: 88, lr: 8.86e-03, grad_scale: 9007199254740992.0 2024-08-11 06:57:39,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=972590.0, ans=0.0 2024-08-11 06:57:45,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=972590.0, ans=0.0 2024-08-11 06:57:51,674 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 06:58:00,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=972690.0, ans=0.125 2024-08-11 06:58:04,673 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.088e-02 2024-08-11 06:58:17,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=972890.0, ans=0.0 2024-08-11 06:58:19,639 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 06:58:30,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=972990.0, ans=0.125 2024-08-11 06:58:31,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.762e+01 3.121e+01 3.725e+01 5.735e+01, threshold=6.242e+01, percent-clipped=0.0 2024-08-11 06:58:42,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10350, loss[loss=0.1071, beats_loss=0.01017, ecapa_loss=0.0002446, whisper_loss=0.09449, over 18414.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.0114, ecapa_loss=0.0002077, whisper_loss=0.09382, over 3916364.19 frames. ], batch size: 74, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 06:58:46,976 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 06:59:05,277 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 06:59:12,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=973290.0, ans=0.04949747468305833 2024-08-11 06:59:13,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=973290.0, ans=0.125 2024-08-11 06:59:16,738 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 06:59:31,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=973390.0, ans=0.0 2024-08-11 06:59:41,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=973490.0, ans=0.0 2024-08-11 06:59:42,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=973490.0, ans=0.0 2024-08-11 06:59:48,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10400, loss[loss=0.1175, beats_loss=0.01138, ecapa_loss=0.0002164, whisper_loss=0.1039, over 21702.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01139, ecapa_loss=0.0002065, whisper_loss=0.09385, over 3912138.49 frames. ], batch size: 87, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 06:59:53,234 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 07:00:05,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=973690.0, ans=0.125 2024-08-11 07:00:16,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-08-11 07:00:31,727 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 07:00:36,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.94 vs. limit=10.0 2024-08-11 07:00:38,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=973890.0, ans=0.0 2024-08-11 07:00:42,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.630e+01 2.925e+01 3.255e+01 4.896e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-11 07:00:50,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=973990.0, ans=0.025 2024-08-11 07:00:53,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10450, loss[loss=0.1148, beats_loss=0.0129, ecapa_loss=0.0002445, whisper_loss=0.09942, over 17904.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01133, ecapa_loss=0.0002084, whisper_loss=0.09376, over 3905945.45 frames. ], batch size: 75, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:01:20,052 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 07:01:25,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=974290.0, ans=0.125 2024-08-11 07:01:38,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=974390.0, ans=10.0 2024-08-11 07:01:54,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.25 vs. limit=10.0 2024-08-11 07:01:58,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=974490.0, ans=0.035 2024-08-11 07:02:02,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10500, loss[loss=0.0958, beats_loss=0.01277, ecapa_loss=0.0002161, whisper_loss=0.08087, over 21538.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0114, ecapa_loss=0.0002093, whisper_loss=0.09316, over 3891689.95 frames. ], batch size: 90, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:02:25,657 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 07:02:35,103 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 07:02:46,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-11 07:02:51,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=974890.0, ans=0.035 2024-08-11 07:02:55,215 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-11 07:02:57,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.245e+01 2.661e+01 2.970e+01 3.368e+01 5.123e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-11 07:03:03,704 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:03:09,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-11 07:03:10,152 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10550, loss[loss=0.07067, beats_loss=0.01163, ecapa_loss=0.0002137, whisper_loss=0.0569, over 18690.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01145, ecapa_loss=0.0002084, whisper_loss=0.09318, over 3908990.22 frames. ], batch size: 75, lr: 8.85e-03, grad_scale: 9007199254740992.0 2024-08-11 07:03:15,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=975090.0, ans=0.09899494936611666 2024-08-11 07:03:37,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-11 07:03:46,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=975290.0, ans=0.125 2024-08-11 07:03:51,949 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 07:04:02,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=975390.0, ans=0.125 2024-08-11 07:04:03,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=975490.0, ans=0.125 2024-08-11 07:04:08,657 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 07:04:18,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10600, loss[loss=0.1072, beats_loss=0.01098, ecapa_loss=0.0002523, whisper_loss=0.09367, over 21366.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01142, ecapa_loss=0.0002078, whisper_loss=0.09317, over 3931762.81 frames. ], batch size: 91, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:04:31,741 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 07:04:54,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=975790.0, ans=0.125 2024-08-11 07:04:55,259 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 07:05:10,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2024-08-11 07:05:11,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.786e+01 3.038e+01 3.518e+01 8.413e+01, threshold=6.076e+01, percent-clipped=1.0 2024-08-11 07:05:23,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10650, loss[loss=0.1016, beats_loss=0.01186, ecapa_loss=0.0002055, whisper_loss=0.08769, over 16311.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01143, ecapa_loss=0.0002065, whisper_loss=0.09359, over 3900039.78 frames. ], batch size: 65, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:05:37,663 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 07:05:38,950 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 07:05:39,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=976190.0, ans=0.125 2024-08-11 07:05:40,059 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 16 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 07:05:42,498 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 07:05:45,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-11 07:05:46,444 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 07:05:47,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=976190.0, ans=0.0 2024-08-11 07:05:51,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=976290.0, ans=0.09899494936611666 2024-08-11 07:05:59,727 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 07:06:10,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.34 vs. limit=10.0 2024-08-11 07:06:29,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10700, loss[loss=0.1219, beats_loss=0.01019, ecapa_loss=0.0002007, whisper_loss=0.1097, over 22834.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01142, ecapa_loss=0.0002051, whisper_loss=0.09417, over 3914175.46 frames. ], batch size: 91, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:06:38,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-11 07:06:53,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=976690.0, ans=0.0 2024-08-11 07:06:53,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=976690.0, ans=0.125 2024-08-11 07:06:54,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=976690.0, ans=0.1 2024-08-11 07:06:55,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=976790.0, ans=0.07 2024-08-11 07:07:02,066 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 07:07:24,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.712e+01 3.090e+01 3.800e+01 9.134e+01, threshold=6.180e+01, percent-clipped=2.0 2024-08-11 07:07:30,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-11 07:07:36,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10750, loss[loss=0.1423, beats_loss=0.008899, ecapa_loss=0.0001822, whisper_loss=0.1315, over 24063.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01136, ecapa_loss=0.0002058, whisper_loss=0.09487, over 3926845.40 frames. ], batch size: 89, lr: 8.84e-03, grad_scale: 9007199254740992.0 2024-08-11 07:07:43,193 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 07:08:22,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=977390.0, ans=0.0 2024-08-11 07:08:31,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=977490.0, ans=0.0 2024-08-11 07:08:42,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=977590.0, ans=0.125 2024-08-11 07:08:43,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10800, loss[loss=0.09313, beats_loss=0.01089, ecapa_loss=0.000234, whisper_loss=0.07989, over 15745.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01142, ecapa_loss=0.000206, whisper_loss=0.0949, over 3943685.21 frames. ], batch size: 64, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:08:52,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=977590.0, ans=0.1 2024-08-11 07:08:53,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=977590.0, ans=0.1 2024-08-11 07:09:16,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=977790.0, ans=0.125 2024-08-11 07:09:35,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=977890.0, ans=0.125 2024-08-11 07:09:39,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.607e+01 2.912e+01 3.510e+01 6.638e+01, threshold=5.825e+01, percent-clipped=1.0 2024-08-11 07:09:43,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=977990.0, ans=0.125 2024-08-11 07:09:49,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=977990.0, ans=0.0 2024-08-11 07:09:51,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10850, loss[loss=0.1312, beats_loss=0.008511, ecapa_loss=0.0002527, whisper_loss=0.1202, over 19776.00 frames. ], tot_loss[loss=0.1088, beats_loss=0.01137, ecapa_loss=0.0002078, whisper_loss=0.09531, over 3938276.73 frames. ], batch size: 76, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:10:20,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-11 07:10:27,150 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 07:10:42,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=978390.0, ans=0.0 2024-08-11 07:10:44,813 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 07:10:59,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10900, loss[loss=0.1075, beats_loss=0.01189, ecapa_loss=0.000238, whisper_loss=0.09321, over 20880.00 frames. ], tot_loss[loss=0.1099, beats_loss=0.0113, ecapa_loss=0.0002088, whisper_loss=0.09651, over 3962630.81 frames. ], batch size: 90, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:11:02,788 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 07:11:03,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=978590.0, ans=0.0 2024-08-11 07:11:06,569 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 07:11:06,804 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.866e+02 2024-08-11 07:11:08,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978590.0, ans=0.1 2024-08-11 07:11:16,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=978690.0, ans=0.125 2024-08-11 07:11:54,171 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 07:11:55,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.834e+01 3.154e+01 3.675e+01 5.808e+01, threshold=6.308e+01, percent-clipped=0.0 2024-08-11 07:11:56,782 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 35 from Vox, 26 fro AS 2024-08-11 07:12:03,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=978990.0, ans=0.125 2024-08-11 07:12:07,476 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 10950, loss[loss=0.111, beats_loss=0.008868, ecapa_loss=0.0002434, whisper_loss=0.09968, over 19016.00 frames. ], tot_loss[loss=0.1096, beats_loss=0.01128, ecapa_loss=0.0002087, whisper_loss=0.09626, over 3968629.12 frames. ], batch size: 81, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:12:22,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2024-08-11 07:12:39,722 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 07:12:46,371 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 07:12:57,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=979390.0, ans=0.2 2024-08-11 07:12:59,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=979390.0, ans=0.125 2024-08-11 07:13:00,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=979490.0, ans=0.2 2024-08-11 07:13:07,813 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 07:13:13,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11000, loss[loss=0.1057, beats_loss=0.0132, ecapa_loss=0.000222, whisper_loss=0.09023, over 22709.00 frames. ], tot_loss[loss=0.1095, beats_loss=0.01127, ecapa_loss=0.0002071, whisper_loss=0.09615, over 3970457.18 frames. ], batch size: 94, lr: 8.83e-03, grad_scale: 9007199254740992.0 2024-08-11 07:13:15,341 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-11 07:13:22,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-11 07:13:55,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=979890.0, ans=0.1 2024-08-11 07:14:08,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.630e+01 2.984e+01 3.392e+01 5.712e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 07:14:19,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=12.0 2024-08-11 07:14:20,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=12.0 2024-08-11 07:14:20,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11050, loss[loss=0.1256, beats_loss=0.01126, ecapa_loss=0.0001794, whisper_loss=0.1125, over 20333.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01136, ecapa_loss=0.000206, whisper_loss=0.09523, over 3964370.64 frames. ], batch size: 80, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:14:29,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=980090.0, ans=0.0 2024-08-11 07:14:29,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=980090.0, ans=0.125 2024-08-11 07:14:32,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=980190.0, ans=0.125 2024-08-11 07:14:37,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=980190.0, ans=0.1 2024-08-11 07:14:40,928 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 07:14:42,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=980190.0, ans=0.5 2024-08-11 07:15:10,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-08-11 07:15:12,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=980390.0, ans=0.125 2024-08-11 07:15:13,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=980490.0, ans=0.0 2024-08-11 07:15:15,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-11 07:15:28,055 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11100, loss[loss=0.1133, beats_loss=0.01028, ecapa_loss=0.000188, whisper_loss=0.1012, over 16331.00 frames. ], tot_loss[loss=0.1091, beats_loss=0.01133, ecapa_loss=0.0002055, whisper_loss=0.09574, over 3948065.37 frames. ], batch size: 62, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:15:28,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=980590.0, ans=0.2 2024-08-11 07:15:40,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=980690.0, ans=0.1 2024-08-11 07:15:41,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=980690.0, ans=0.0 2024-08-11 07:15:45,406 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 07:15:51,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=980690.0, ans=0.125 2024-08-11 07:15:56,425 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 07:16:10,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-08-11 07:16:23,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+01 2.722e+01 3.049e+01 3.591e+01 6.029e+01, threshold=6.098e+01, percent-clipped=1.0 2024-08-11 07:16:25,080 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-11 07:16:27,940 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 07:16:28,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=980990.0, ans=0.1 2024-08-11 07:16:36,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11150, loss[loss=0.1176, beats_loss=0.01139, ecapa_loss=0.0001947, whisper_loss=0.1043, over 19229.00 frames. ], tot_loss[loss=0.1085, beats_loss=0.01133, ecapa_loss=0.000205, whisper_loss=0.09513, over 3911143.93 frames. ], batch size: 77, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:16:39,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=981090.0, ans=0.0 2024-08-11 07:16:42,244 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 07:16:42,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2024-08-11 07:16:55,374 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-11 07:17:04,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=981290.0, ans=0.125 2024-08-11 07:17:14,468 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 07:17:23,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=981390.0, ans=0.1 2024-08-11 07:17:40,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=981490.0, ans=0.2 2024-08-11 07:17:43,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11200, loss[loss=0.1117, beats_loss=0.01205, ecapa_loss=0.0002088, whisper_loss=0.09758, over 22976.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01138, ecapa_loss=0.000204, whisper_loss=0.09496, over 3925923.55 frames. ], batch size: 92, lr: 8.82e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:17:48,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=981590.0, ans=0.0 2024-08-11 07:17:49,525 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:18:05,628 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 07:18:09,608 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 07:18:35,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=981890.0, ans=0.1 2024-08-11 07:18:39,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+01 2.676e+01 2.993e+01 3.397e+01 5.977e+01, threshold=5.986e+01, percent-clipped=0.0 2024-08-11 07:18:51,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11250, loss[loss=0.1128, beats_loss=0.01187, ecapa_loss=0.0002117, whisper_loss=0.09883, over 22023.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01135, ecapa_loss=0.0002057, whisper_loss=0.09503, over 3903072.37 frames. ], batch size: 85, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:18:56,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=982090.0, ans=0.125 2024-08-11 07:19:02,303 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 07:19:03,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=982190.0, ans=0.125 2024-08-11 07:19:05,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=982190.0, ans=0.2 2024-08-11 07:19:23,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=982290.0, ans=0.125 2024-08-11 07:19:35,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=982390.0, ans=0.125 2024-08-11 07:19:51,916 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 07:19:53,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=982490.0, ans=0.125 2024-08-11 07:19:59,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11300, loss[loss=0.124, beats_loss=0.009283, ecapa_loss=0.0002014, whisper_loss=0.1128, over 22510.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01131, ecapa_loss=0.0002051, whisper_loss=0.09414, over 3895622.72 frames. ], batch size: 90, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:20:00,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=22.5 2024-08-11 07:20:05,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=982590.0, ans=0.125 2024-08-11 07:20:08,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=982590.0, ans=0.125 2024-08-11 07:20:10,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=982590.0, ans=0.2 2024-08-11 07:20:12,469 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 07:20:17,660 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 07:20:33,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=982790.0, ans=0.0 2024-08-11 07:20:35,997 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 07:20:37,504 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 07:20:37,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=982790.0, ans=0.0 2024-08-11 07:20:53,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.719e+01 3.008e+01 3.388e+01 1.679e+02, threshold=6.016e+01, percent-clipped=1.0 2024-08-11 07:21:05,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11350, loss[loss=0.08915, beats_loss=0.01031, ecapa_loss=0.0002492, whisper_loss=0.07635, over 14234.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01134, ecapa_loss=0.0002053, whisper_loss=0.09422, over 3915698.59 frames. ], batch size: 56, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:21:08,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=983090.0, ans=0.0 2024-08-11 07:21:13,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=983090.0, ans=0.0 2024-08-11 07:21:26,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=983190.0, ans=0.2 2024-08-11 07:21:30,007 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 07:21:30,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=983290.0, ans=0.0 2024-08-11 07:21:33,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=12.0 2024-08-11 07:22:01,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=983490.0, ans=0.125 2024-08-11 07:22:10,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11400, loss[loss=0.1149, beats_loss=0.009298, ecapa_loss=0.0002267, whisper_loss=0.1033, over 20435.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01133, ecapa_loss=0.0002051, whisper_loss=0.09404, over 3871665.35 frames. ], batch size: 80, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:22:13,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=983590.0, ans=0.125 2024-08-11 07:22:14,176 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-11 07:22:27,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=983690.0, ans=0.125 2024-08-11 07:22:41,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-11 07:22:42,269 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 07:22:46,031 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 07:22:47,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=983890.0, ans=0.2 2024-08-11 07:22:53,888 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.116e-02 2024-08-11 07:22:57,473 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-11 07:23:00,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=983990.0, ans=0.125 2024-08-11 07:23:02,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.896e+01 3.252e+01 3.905e+01 6.465e+01, threshold=6.504e+01, percent-clipped=1.0 2024-08-11 07:23:05,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=983990.0, ans=0.0 2024-08-11 07:23:13,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11450, loss[loss=0.1252, beats_loss=0.009075, ecapa_loss=0.0002019, whisper_loss=0.1141, over 22466.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01134, ecapa_loss=0.000204, whisper_loss=0.09427, over 3886294.20 frames. ], batch size: 88, lr: 8.81e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:23:14,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=984090.0, ans=0.0 2024-08-11 07:23:17,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2024-08-11 07:23:18,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=984090.0, ans=0.0 2024-08-11 07:23:19,279 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 43 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 07:23:34,844 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 07:23:35,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=984190.0, ans=0.125 2024-08-11 07:23:37,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2024-08-11 07:23:39,900 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-11 07:23:45,542 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 07:23:46,945 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-11 07:24:13,209 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 07:24:20,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984490.0, ans=0.1 2024-08-11 07:24:23,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11500, loss[loss=0.1276, beats_loss=0.01073, ecapa_loss=0.0002063, whisper_loss=0.1148, over 14119.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01136, ecapa_loss=0.0002022, whisper_loss=0.09411, over 3913462.66 frames. ], batch size: 56, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:24:29,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984590.0, ans=0.1 2024-08-11 07:24:38,297 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 07:25:38,048 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-11 07:25:38,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=984890.0, ans=0.125 2024-08-11 07:25:45,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.765e+01 3.010e+01 3.592e+01 5.034e+01, threshold=6.021e+01, percent-clipped=0.0 2024-08-11 07:25:45,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=984990.0, ans=0.125 2024-08-11 07:26:03,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11550, loss[loss=0.09952, beats_loss=0.01515, ecapa_loss=0.0001384, whisper_loss=0.08298, over 18597.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0114, ecapa_loss=0.0002016, whisper_loss=0.09426, over 3911533.35 frames. ], batch size: 73, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:26:15,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=985090.0, ans=0.125 2024-08-11 07:26:18,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985090.0, ans=0.1 2024-08-11 07:26:31,225 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 07:26:38,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=985190.0, ans=0.0 2024-08-11 07:26:39,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=985190.0, ans=0.125 2024-08-11 07:26:39,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=985190.0, ans=0.0 2024-08-11 07:27:07,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=985390.0, ans=0.125 2024-08-11 07:27:11,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985390.0, ans=0.1 2024-08-11 07:27:17,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=985390.0, ans=0.025 2024-08-11 07:27:17,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=985390.0, ans=0.125 2024-08-11 07:27:25,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=985390.0, ans=0.0 2024-08-11 07:27:34,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=985490.0, ans=0.125 2024-08-11 07:27:52,387 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11600, loss[loss=0.1043, beats_loss=0.01135, ecapa_loss=0.0001992, whisper_loss=0.091, over 21850.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.000203, whisper_loss=0.09396, over 3920182.50 frames. ], batch size: 85, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:28:31,013 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 07:28:39,158 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 07:28:43,367 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 07:29:02,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=985790.0, ans=0.1 2024-08-11 07:29:02,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-11 07:29:03,813 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-11 07:29:21,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=985890.0, ans=0.125 2024-08-11 07:29:23,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=985890.0, ans=0.2 2024-08-11 07:29:29,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2024-08-11 07:29:29,863 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.587e+01 2.898e+01 3.413e+01 5.144e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-11 07:29:44,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11650, loss[loss=0.09926, beats_loss=0.01484, ecapa_loss=0.0001374, whisper_loss=0.08305, over 16769.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01146, ecapa_loss=0.0002032, whisper_loss=0.09307, over 3915311.98 frames. ], batch size: 62, lr: 8.80e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:30:10,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2024-08-11 07:30:11,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2024-08-11 07:30:15,975 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 07:30:34,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=986290.0, ans=0.0 2024-08-11 07:30:49,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=986390.0, ans=0.125 2024-08-11 07:30:57,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=986490.0, ans=0.0 2024-08-11 07:31:05,240 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 07:31:13,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11700, loss[loss=0.1233, beats_loss=0.009603, ecapa_loss=0.0002293, whisper_loss=0.1114, over 19370.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01152, ecapa_loss=0.000205, whisper_loss=0.0929, over 3953853.47 frames. ], batch size: 77, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:31:13,785 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 07:31:22,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=986590.0, ans=0.125 2024-08-11 07:31:26,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=986590.0, ans=0.0 2024-08-11 07:31:40,196 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 07:31:49,311 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 07:31:49,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=986790.0, ans=0.125 2024-08-11 07:31:57,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-11 07:31:59,029 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 07:32:03,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=986890.0, ans=0.125 2024-08-11 07:32:05,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=986890.0, ans=0.1 2024-08-11 07:32:19,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=986890.0, ans=0.0 2024-08-11 07:32:24,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+01 2.842e+01 3.149e+01 3.845e+01 7.778e+01, threshold=6.297e+01, percent-clipped=3.0 2024-08-11 07:32:39,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11750, loss[loss=0.09557, beats_loss=0.01085, ecapa_loss=0.0001823, whisper_loss=0.0829, over 14420.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01155, ecapa_loss=0.0002053, whisper_loss=0.09291, over 3941364.04 frames. ], batch size: 55, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:32:47,833 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 36 from Vox, 32 fro AS 2024-08-11 07:33:23,193 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 07:33:40,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-08-11 07:33:50,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=987490.0, ans=0.125 2024-08-11 07:34:09,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11800, loss[loss=0.09376, beats_loss=0.01311, ecapa_loss=0.0001635, whisper_loss=0.07902, over 14788.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01152, ecapa_loss=0.0002048, whisper_loss=0.09343, over 3950071.51 frames. ], batch size: 58, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:34:16,232 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 07:35:18,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=987990.0, ans=0.0 2024-08-11 07:35:18,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.720e+01 3.073e+01 3.423e+01 3.198e+02, threshold=6.145e+01, percent-clipped=1.0 2024-08-11 07:35:36,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11850, loss[loss=0.1032, beats_loss=0.01142, ecapa_loss=0.0002329, whisper_loss=0.08949, over 17357.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01148, ecapa_loss=0.0002061, whisper_loss=0.09366, over 3919553.22 frames. ], batch size: 72, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:35:51,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=988090.0, ans=0.0 2024-08-11 07:35:55,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=988190.0, ans=0.125 2024-08-11 07:35:58,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=988190.0, ans=0.0 2024-08-11 07:36:05,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=988190.0, ans=0.0 2024-08-11 07:36:12,573 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 07:36:26,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=988290.0, ans=0.0 2024-08-11 07:36:33,586 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 07:36:34,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2024-08-11 07:36:38,707 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 07:36:53,101 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 07:37:01,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11900, loss[loss=0.1233, beats_loss=0.01039, ecapa_loss=0.0002199, whisper_loss=0.1107, over 21301.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01146, ecapa_loss=0.0002061, whisper_loss=0.09386, over 3933972.94 frames. ], batch size: 88, lr: 8.79e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:37:05,376 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-11 07:37:22,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988690.0, ans=0.1 2024-08-11 07:37:30,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-11 07:37:49,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.55 vs. limit=22.5 2024-08-11 07:37:56,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=988890.0, ans=0.125 2024-08-11 07:37:59,676 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 07:38:06,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+01 2.737e+01 3.168e+01 3.571e+01 8.955e+01, threshold=6.335e+01, percent-clipped=2.0 2024-08-11 07:38:10,181 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 07:38:11,488 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 07:38:16,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=988990.0, ans=0.0 2024-08-11 07:38:20,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 11950, loss[loss=0.08773, beats_loss=0.01615, ecapa_loss=0.0001703, whisper_loss=0.06988, over 13586.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01147, ecapa_loss=0.0002065, whisper_loss=0.09345, over 3921896.25 frames. ], batch size: 54, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:38:37,752 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 07:38:39,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=989190.0, ans=0.1 2024-08-11 07:38:56,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=989290.0, ans=0.0 2024-08-11 07:38:58,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=989290.0, ans=0.1 2024-08-11 07:38:58,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=22.5 2024-08-11 07:39:01,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=989290.0, ans=0.5 2024-08-11 07:39:12,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=989390.0, ans=0.125 2024-08-11 07:39:12,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.02 vs. limit=22.5 2024-08-11 07:39:16,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=989390.0, ans=0.0 2024-08-11 07:39:27,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=989490.0, ans=0.125 2024-08-11 07:39:37,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12000, loss[loss=0.1028, beats_loss=0.01133, ecapa_loss=0.000233, whisper_loss=0.08914, over 18158.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01146, ecapa_loss=0.0002047, whisper_loss=0.09289, over 3902086.78 frames. ], batch size: 75, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:39:37,878 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 07:40:13,105 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on ASR_libri: loss=0.2587, beats_loss=0, ecapa_loss=0.0006674, whisper_loss=0.252, over 922467.00 frames. 2024-08-11 07:40:32,511 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on SV_voxceleb1: loss=0.005495, beats_loss=0, ecapa_loss=0.0005495, whisper_loss=0, over 939242.00 frames. 2024-08-11 07:42:18,208 INFO [train_multi_KD3.py:1149] (0/4) Epoch 7, validation on AT_audioset: loss=0.02554, beats_loss=0.02554, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 07:42:18,212 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 07:42:27,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=989590.0, ans=0.125 2024-08-11 07:42:37,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=989690.0, ans=0.1 2024-08-11 07:43:08,718 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 07:43:16,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=989890.0, ans=0.125 2024-08-11 07:43:21,390 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.165e+01 2.747e+01 3.219e+01 3.881e+01 9.695e+01, threshold=6.438e+01, percent-clipped=1.0 2024-08-11 07:43:23,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=989990.0, ans=0.025 2024-08-11 07:43:23,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=989990.0, ans=0.1 2024-08-11 07:43:32,598 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 07:43:35,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12050, loss[loss=0.1082, beats_loss=0.01176, ecapa_loss=0.0002105, whisper_loss=0.09428, over 22019.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01149, ecapa_loss=0.0002049, whisper_loss=0.09271, over 3906356.21 frames. ], batch size: 87, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:44:05,043 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 07:44:13,955 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 07:44:33,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=990390.0, ans=0.125 2024-08-11 07:44:41,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=990490.0, ans=0.2 2024-08-11 07:44:50,577 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12100, loss[loss=0.1026, beats_loss=0.01228, ecapa_loss=0.0002174, whisper_loss=0.08818, over 22598.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01142, ecapa_loss=0.0002069, whisper_loss=0.09297, over 3882817.72 frames. ], batch size: 91, lr: 8.78e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:44:50,755 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 07:44:56,335 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 07:44:58,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=990590.0, ans=0.2 2024-08-11 07:45:05,458 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:45:08,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=990690.0, ans=0.2 2024-08-11 07:45:54,933 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.799e+01 3.089e+01 3.650e+01 5.391e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 07:46:10,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12150, loss[loss=0.1002, beats_loss=0.01399, ecapa_loss=0.0001657, whisper_loss=0.08458, over 16647.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01142, ecapa_loss=0.0002072, whisper_loss=0.09315, over 3874017.82 frames. ], batch size: 65, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:46:15,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=991090.0, ans=0.0 2024-08-11 07:46:18,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2024-08-11 07:46:19,602 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 07:46:23,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=991090.0, ans=0.125 2024-08-11 07:47:16,290 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 07:47:30,479 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12200, loss[loss=0.1108, beats_loss=0.008218, ecapa_loss=0.0002191, whisper_loss=0.1004, over 14257.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01144, ecapa_loss=0.0002061, whisper_loss=0.09264, over 3864412.36 frames. ], batch size: 53, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:47:40,202 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 07:47:40,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=991590.0, ans=0.125 2024-08-11 07:47:40,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=991590.0, ans=0.0 2024-08-11 07:47:44,856 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 07:47:51,460 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 07:47:58,298 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.206e-01 2024-08-11 07:48:03,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=991790.0, ans=0.125 2024-08-11 07:48:26,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=991890.0, ans=0.125 2024-08-11 07:48:34,579 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 07:48:35,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.629e+01 2.882e+01 3.326e+01 5.595e+01, threshold=5.765e+01, percent-clipped=0.0 2024-08-11 07:48:37,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=991990.0, ans=0.0 2024-08-11 07:48:43,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=991990.0, ans=0.0 2024-08-11 07:48:49,399 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12250, loss[loss=0.1173, beats_loss=0.008549, ecapa_loss=0.000253, whisper_loss=0.1062, over 18049.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0002065, whisper_loss=0.09352, over 3889363.74 frames. ], batch size: 76, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:48:55,489 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 07:48:57,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=992090.0, ans=0.2 2024-08-11 07:49:14,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.56 vs. limit=12.0 2024-08-11 07:49:14,708 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 07:49:21,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2024-08-11 07:49:41,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.36 vs. limit=10.0 2024-08-11 07:49:47,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=992390.0, ans=0.125 2024-08-11 07:49:50,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=992490.0, ans=0.125 2024-08-11 07:49:54,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2024-08-11 07:50:06,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=992490.0, ans=0.2 2024-08-11 07:50:08,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12300, loss[loss=0.1127, beats_loss=0.01024, ecapa_loss=0.0002302, whisper_loss=0.1002, over 15519.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01139, ecapa_loss=0.0002074, whisper_loss=0.09302, over 3893577.92 frames. ], batch size: 63, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:50:13,321 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 07:50:15,302 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 07:50:18,642 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-11 07:50:22,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-08-11 07:50:31,609 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 07:50:33,174 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 07:50:38,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=992690.0, ans=0.09899494936611666 2024-08-11 07:50:52,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=992790.0, ans=0.09899494936611666 2024-08-11 07:50:59,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=992890.0, ans=0.125 2024-08-11 07:51:00,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2024-08-11 07:51:12,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.794e+01 3.118e+01 3.585e+01 7.136e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-11 07:51:17,336 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 07:51:27,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12350, loss[loss=0.1259, beats_loss=0.008662, ecapa_loss=0.0002262, whisper_loss=0.115, over 22136.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01131, ecapa_loss=0.0002083, whisper_loss=0.09363, over 3927872.99 frames. ], batch size: 88, lr: 8.77e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:51:37,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=993090.0, ans=0.1 2024-08-11 07:51:37,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2024-08-11 07:52:00,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=993290.0, ans=0.0 2024-08-11 07:52:02,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=993290.0, ans=0.125 2024-08-11 07:52:10,502 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 07:52:36,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=993490.0, ans=0.0 2024-08-11 07:52:41,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12400, loss[loss=0.1333, beats_loss=0.009778, ecapa_loss=0.0001899, whisper_loss=0.1216, over 24335.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0002054, whisper_loss=0.09428, over 3921707.36 frames. ], batch size: 93, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:52:44,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=993590.0, ans=0.0 2024-08-11 07:52:44,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=993590.0, ans=0.125 2024-08-11 07:52:53,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=993590.0, ans=0.0 2024-08-11 07:53:07,138 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 07:53:07,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=993690.0, ans=0.125 2024-08-11 07:53:14,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=993790.0, ans=0.0 2024-08-11 07:53:35,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=993890.0, ans=0.125 2024-08-11 07:53:39,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=15.0 2024-08-11 07:53:47,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.944e+01 3.370e+01 3.888e+01 6.179e+01, threshold=6.739e+01, percent-clipped=0.0 2024-08-11 07:54:01,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12450, loss[loss=0.08901, beats_loss=0.01056, ecapa_loss=0.0003301, whisper_loss=0.07515, over 13510.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01138, ecapa_loss=0.0002047, whisper_loss=0.09394, over 3948167.16 frames. ], batch size: 61, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:54:07,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.76 vs. limit=15.0 2024-08-11 07:54:15,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=994090.0, ans=0.0 2024-08-11 07:54:32,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=994290.0, ans=0.125 2024-08-11 07:54:56,947 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:54:58,114 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 07:55:06,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=994490.0, ans=0.2 2024-08-11 07:55:09,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.24 vs. limit=15.0 2024-08-11 07:55:19,305 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12500, loss[loss=0.1003, beats_loss=0.01031, ecapa_loss=0.0002363, whisper_loss=0.08759, over 21938.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0114, ecapa_loss=0.000205, whisper_loss=0.09331, over 3936222.62 frames. ], batch size: 89, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:55:36,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 07:55:43,095 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-11 07:55:58,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=994790.0, ans=0.1 2024-08-11 07:56:23,232 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.789e+01 3.126e+01 3.797e+01 5.980e+01, threshold=6.252e+01, percent-clipped=0.0 2024-08-11 07:56:31,611 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 07:56:37,054 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12550, loss[loss=0.08039, beats_loss=0.01479, ecapa_loss=0.0001682, whisper_loss=0.06392, over 15356.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01138, ecapa_loss=0.0002057, whisper_loss=0.09362, over 3946693.57 frames. ], batch size: 64, lr: 8.76e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:56:41,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=995090.0, ans=0.125 2024-08-11 07:56:43,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995090.0, ans=0.1 2024-08-11 07:56:45,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=995090.0, ans=0.125 2024-08-11 07:57:03,012 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 07:57:20,961 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-11 07:57:23,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=995390.0, ans=0.125 2024-08-11 07:57:25,943 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 10 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 07:57:41,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=995490.0, ans=0.125 2024-08-11 07:57:41,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2024-08-11 07:57:48,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=995490.0, ans=0.125 2024-08-11 07:57:54,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.34 vs. limit=5.0 2024-08-11 07:57:56,115 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12600, loss[loss=0.1274, beats_loss=0.009087, ecapa_loss=0.0002331, whisper_loss=0.116, over 22311.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01152, ecapa_loss=0.0002052, whisper_loss=0.0937, over 3946188.95 frames. ], batch size: 89, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:58:12,955 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 07:58:19,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=995690.0, ans=0.125 2024-08-11 07:58:56,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=995890.0, ans=0.125 2024-08-11 07:58:58,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=12.0 2024-08-11 07:59:00,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+01 2.576e+01 3.023e+01 3.555e+01 7.578e+01, threshold=6.047e+01, percent-clipped=3.0 2024-08-11 07:59:11,990 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 07:59:16,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=996090.0, ans=0.015 2024-08-11 07:59:17,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12650, loss[loss=0.1228, beats_loss=0.009897, ecapa_loss=0.000201, whisper_loss=0.1109, over 22744.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01149, ecapa_loss=0.0002054, whisper_loss=0.09365, over 3934105.62 frames. ], batch size: 89, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 07:59:19,660 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-11 07:59:20,944 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 17 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 07:59:23,838 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 08:00:15,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=996390.0, ans=0.2 2024-08-11 08:00:20,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-11 08:00:34,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=996490.0, ans=0.1 2024-08-11 08:00:35,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=996490.0, ans=0.1 2024-08-11 08:00:37,001 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 08:00:41,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=996590.0, ans=0.125 2024-08-11 08:00:42,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12700, loss[loss=0.1146, beats_loss=0.01264, ecapa_loss=0.0002088, whisper_loss=0.09984, over 21598.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01159, ecapa_loss=0.0002052, whisper_loss=0.09257, over 3902598.96 frames. ], batch size: 87, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:00:50,745 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 08:00:59,746 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:01:37,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=996890.0, ans=0.2 2024-08-11 08:01:50,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=996990.0, ans=0.04949747468305833 2024-08-11 08:01:53,858 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.625e+01 2.937e+01 3.351e+01 6.413e+01, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 08:02:06,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=996990.0, ans=0.0 2024-08-11 08:02:10,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12750, loss[loss=0.0897, beats_loss=0.01178, ecapa_loss=0.0002007, whisper_loss=0.07592, over 15017.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01157, ecapa_loss=0.0002052, whisper_loss=0.09301, over 3880401.41 frames. ], batch size: 60, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:02:16,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=997090.0, ans=0.05 2024-08-11 08:02:40,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=997190.0, ans=0.0 2024-08-11 08:02:43,959 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-11 08:02:49,129 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.889e-01 2024-08-11 08:03:32,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12800, loss[loss=0.09831, beats_loss=0.01371, ecapa_loss=0.0001904, whisper_loss=0.0827, over 17381.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0115, ecapa_loss=0.0002066, whisper_loss=0.09261, over 3889942.39 frames. ], batch size: 72, lr: 8.75e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:03:36,310 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 08:03:41,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=997590.0, ans=0.125 2024-08-11 08:03:43,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2024-08-11 08:03:55,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=997690.0, ans=0.125 2024-08-11 08:04:42,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.631e+01 3.014e+01 3.452e+01 5.658e+01, threshold=6.028e+01, percent-clipped=0.0 2024-08-11 08:04:56,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12850, loss[loss=0.1086, beats_loss=0.01083, ecapa_loss=0.0002103, whisper_loss=0.09565, over 20894.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0115, ecapa_loss=0.0002067, whisper_loss=0.09234, over 3860275.29 frames. ], batch size: 81, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:05:13,768 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 08:05:23,371 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 08:05:26,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=998290.0, ans=0.125 2024-08-11 08:05:34,128 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 08:05:43,536 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 08:05:52,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=998390.0, ans=0.125 2024-08-11 08:05:52,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=998390.0, ans=0.0 2024-08-11 08:05:58,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-08-11 08:06:14,189 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 08:06:16,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=998590.0, ans=0.2 2024-08-11 08:06:17,260 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12900, loss[loss=0.0998, beats_loss=0.012, ecapa_loss=0.0002154, whisper_loss=0.08565, over 22682.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01146, ecapa_loss=0.0002073, whisper_loss=0.09217, over 3852512.10 frames. ], batch size: 90, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:06:17,389 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 08:06:22,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=998590.0, ans=0.2 2024-08-11 08:06:52,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=998790.0, ans=0.0 2024-08-11 08:07:19,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=998890.0, ans=0.125 2024-08-11 08:07:24,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.613e+01 2.962e+01 3.305e+01 5.857e+01, threshold=5.923e+01, percent-clipped=0.0 2024-08-11 08:07:25,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-11 08:07:41,996 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 12950, loss[loss=0.128, beats_loss=0.008757, ecapa_loss=0.0001773, whisper_loss=0.1175, over 16681.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01142, ecapa_loss=0.0002067, whisper_loss=0.09212, over 3842335.10 frames. ], batch size: 61, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:07:42,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=999090.0, ans=0.125 2024-08-11 08:07:51,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=999090.0, ans=0.125 2024-08-11 08:07:57,485 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 08:08:07,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=999190.0, ans=0.1 2024-08-11 08:08:20,230 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.382e+05 2024-08-11 08:08:27,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-11 08:08:43,509 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 08:08:43,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=999390.0, ans=0.1 2024-08-11 08:08:54,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=999490.0, ans=0.125 2024-08-11 08:09:11,402 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13000, loss[loss=0.1205, beats_loss=0.01071, ecapa_loss=0.0002355, whisper_loss=0.1074, over 21632.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01145, ecapa_loss=0.0002068, whisper_loss=0.09197, over 3849405.36 frames. ], batch size: 90, lr: 8.74e-03, grad_scale: 1.8014398509481984e+16 2024-08-11 08:09:18,876 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 08:09:47,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=999790.0, ans=0.2 2024-08-11 08:10:00,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=999790.0, ans=0.2 2024-08-11 08:10:02,062 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 08:10:17,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=999890.0, ans=0.0 2024-08-11 08:10:17,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=999890.0, ans=0.2 2024-08-11 08:10:21,339 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-100000.pt 2024-08-11 08:10:25,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.746e+01 3.044e+01 3.535e+01 5.645e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:10:27,896 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 08:10:28,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=999990.0, ans=0.0 2024-08-11 08:10:36,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=999990.0, ans=0.125 2024-08-11 08:10:39,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13050, loss[loss=0.127, beats_loss=0.008789, ecapa_loss=0.0002222, whisper_loss=0.116, over 17493.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01149, ecapa_loss=0.0002065, whisper_loss=0.09224, over 3844477.96 frames. ], batch size: 67, lr: 8.74e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:10:49,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1000090.0, ans=0.125 2024-08-11 08:10:54,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-11 08:11:16,158 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 08:11:16,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1000290.0, ans=0.05 2024-08-11 08:11:18,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1000290.0, ans=0.1 2024-08-11 08:11:27,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1000390.0, ans=0.125 2024-08-11 08:11:47,304 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 12 from LS+wenet, 10 from Vox, 44 fro AS 2024-08-11 08:11:55,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13100, loss[loss=0.1273, beats_loss=0.008637, ecapa_loss=0.0002579, whisper_loss=0.116, over 22753.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01152, ecapa_loss=0.0002057, whisper_loss=0.09233, over 3843041.70 frames. ], batch size: 94, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:12:10,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1000690.0, ans=0.02 2024-08-11 08:12:14,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1000690.0, ans=0.125 2024-08-11 08:12:18,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1000690.0, ans=0.125 2024-08-11 08:12:23,938 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 08:12:38,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1000890.0, ans=0.125 2024-08-11 08:12:47,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1000890.0, ans=0.0 2024-08-11 08:12:49,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1000890.0, ans=0.125 2024-08-11 08:12:54,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.157e+01 2.920e+01 3.431e+01 3.898e+01 1.839e+02, threshold=6.862e+01, percent-clipped=3.0 2024-08-11 08:13:07,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13150, loss[loss=0.09063, beats_loss=0.01509, ecapa_loss=0.0001632, whisper_loss=0.07391, over 19756.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0115, ecapa_loss=0.0002039, whisper_loss=0.09286, over 3853510.81 frames. ], batch size: 75, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:13:16,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1001090.0, ans=0.1 2024-08-11 08:13:19,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1001090.0, ans=0.0 2024-08-11 08:13:46,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1001290.0, ans=0.125 2024-08-11 08:14:12,375 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 08:14:16,813 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 17 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 08:14:20,791 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13200, loss[loss=0.08672, beats_loss=0.01228, ecapa_loss=0.0001928, whisper_loss=0.07252, over 16577.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01153, ecapa_loss=0.0002049, whisper_loss=0.09236, over 3817841.10 frames. ], batch size: 66, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:14:29,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1001590.0, ans=0.125 2024-08-11 08:14:30,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.52 vs. limit=10.0 2024-08-11 08:14:36,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1001690.0, ans=0.125 2024-08-11 08:15:04,029 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 08:15:06,585 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 08:15:17,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1001890.0, ans=0.125 2024-08-11 08:15:20,170 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 08:15:22,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.762e+01 3.091e+01 3.560e+01 4.785e+01, threshold=6.182e+01, percent-clipped=0.0 2024-08-11 08:15:23,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1001990.0, ans=0.1 2024-08-11 08:15:23,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=15.0 2024-08-11 08:15:36,322 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13250, loss[loss=0.1236, beats_loss=0.009017, ecapa_loss=0.0001807, whisper_loss=0.1128, over 20092.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01149, ecapa_loss=0.0002057, whisper_loss=0.09297, over 3815651.72 frames. ], batch size: 73, lr: 8.73e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:15:41,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2024-08-11 08:16:26,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-11 08:16:27,262 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-11 08:16:36,451 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 08:16:41,223 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 08:16:51,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13300, loss[loss=0.09914, beats_loss=0.01571, ecapa_loss=0.0001874, whisper_loss=0.08156, over 18324.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01147, ecapa_loss=0.0002041, whisper_loss=0.09346, over 3801944.02 frames. ], batch size: 74, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:16:53,072 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 08:16:59,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1002590.0, ans=0.125 2024-08-11 08:17:20,128 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 08:17:39,525 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 08:17:40,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=12.0 2024-08-11 08:17:51,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1002890.0, ans=0.125 2024-08-11 08:17:55,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.657e+01 3.097e+01 3.589e+01 1.012e+02, threshold=6.194e+01, percent-clipped=1.0 2024-08-11 08:18:02,180 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 08:18:10,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13350, loss[loss=0.09041, beats_loss=0.01277, ecapa_loss=0.0001582, whisper_loss=0.07606, over 19753.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0115, ecapa_loss=0.0002026, whisper_loss=0.09354, over 3798383.60 frames. ], batch size: 78, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:18:16,624 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 08:18:30,615 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 9 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 08:18:32,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1003190.0, ans=0.125 2024-08-11 08:18:33,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2024-08-11 08:18:39,262 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 08:19:04,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1003390.0, ans=0.125 2024-08-11 08:19:28,191 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 08:19:28,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1003590.0, ans=0.125 2024-08-11 08:19:29,348 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13400, loss[loss=0.1106, beats_loss=0.01001, ecapa_loss=0.0002526, whisper_loss=0.09808, over 16878.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01145, ecapa_loss=0.0002036, whisper_loss=0.09375, over 3807724.64 frames. ], batch size: 68, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:19:32,348 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 08:19:33,991 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 08:19:37,965 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 08:19:39,949 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 08:19:46,284 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 08:20:22,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2024-08-11 08:20:28,285 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 08:20:34,180 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.185e+01 2.700e+01 3.139e+01 3.511e+01 8.019e+01, threshold=6.278e+01, percent-clipped=1.0 2024-08-11 08:20:41,777 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 08:20:47,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13450, loss[loss=0.1191, beats_loss=0.008454, ecapa_loss=0.0002834, whisper_loss=0.1078, over 13746.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01146, ecapa_loss=0.0002047, whisper_loss=0.09393, over 3831192.66 frames. ], batch size: 55, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:21:24,697 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.824e+05 2024-08-11 08:21:53,009 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 17 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 08:21:54,294 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 08:21:57,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1004490.0, ans=0.1 2024-08-11 08:21:59,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1004490.0, ans=0.125 2024-08-11 08:22:00,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=15.0 2024-08-11 08:22:01,444 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-11 08:22:05,951 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13500, loss[loss=0.1106, beats_loss=0.01183, ecapa_loss=0.0002232, whisper_loss=0.09655, over 21732.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01148, ecapa_loss=0.0002057, whisper_loss=0.0934, over 3843595.86 frames. ], batch size: 89, lr: 8.72e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:22:10,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2024-08-11 08:22:13,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1004590.0, ans=0.125 2024-08-11 08:22:20,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1004690.0, ans=0.0 2024-08-11 08:22:22,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1004690.0, ans=0.2 2024-08-11 08:22:28,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1004690.0, ans=0.125 2024-08-11 08:22:40,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1004790.0, ans=0.07 2024-08-11 08:22:43,387 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 08:22:46,186 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 08:22:47,665 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 08:22:50,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1004890.0, ans=0.05 2024-08-11 08:23:04,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.720e+01 3.065e+01 3.481e+01 5.636e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-11 08:23:05,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-11 08:23:13,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2024-08-11 08:23:14,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1004990.0, ans=0.125 2024-08-11 08:23:17,351 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 08:23:17,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1005090.0, ans=0.125 2024-08-11 08:23:17,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1005090.0, ans=0.07 2024-08-11 08:23:18,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13550, loss[loss=0.1251, beats_loss=0.01041, ecapa_loss=0.0002268, whisper_loss=0.1124, over 23226.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01141, ecapa_loss=0.0002051, whisper_loss=0.09363, over 3874868.57 frames. ], batch size: 94, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:23:20,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1005090.0, ans=0.125 2024-08-11 08:23:21,255 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 08:23:31,122 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 08:24:17,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1005490.0, ans=0.07 2024-08-11 08:24:20,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-11 08:24:32,096 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13600, loss[loss=0.1109, beats_loss=0.01126, ecapa_loss=0.0002257, whisper_loss=0.09733, over 17361.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01139, ecapa_loss=0.0002042, whisper_loss=0.09374, over 3893909.25 frames. ], batch size: 68, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:24:38,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1005590.0, ans=0.125 2024-08-11 08:24:45,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1005690.0, ans=0.025 2024-08-11 08:25:02,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1005790.0, ans=0.0 2024-08-11 08:25:13,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1005790.0, ans=0.0 2024-08-11 08:25:31,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+01 2.811e+01 3.158e+01 3.669e+01 1.616e+02, threshold=6.317e+01, percent-clipped=3.0 2024-08-11 08:25:32,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1005990.0, ans=0.125 2024-08-11 08:25:44,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13650, loss[loss=0.09517, beats_loss=0.01187, ecapa_loss=0.00027, whisper_loss=0.0806, over 16523.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01144, ecapa_loss=0.0002051, whisper_loss=0.0939, over 3886579.48 frames. ], batch size: 70, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:25:44,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1006090.0, ans=0.1 2024-08-11 08:25:57,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1006090.0, ans=0.2 2024-08-11 08:25:58,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1006190.0, ans=0.125 2024-08-11 08:26:00,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1006190.0, ans=0.125 2024-08-11 08:26:26,569 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 08:26:42,019 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 08:26:43,349 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 08:26:50,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2024-08-11 08:27:00,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13700, loss[loss=0.1285, beats_loss=0.01094, ecapa_loss=0.0001704, whisper_loss=0.1158, over 23690.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01152, ecapa_loss=0.0002035, whisper_loss=0.09374, over 3879056.57 frames. ], batch size: 90, lr: 8.71e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:27:02,456 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 08:27:08,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1006590.0, ans=0.125 2024-08-11 08:27:16,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1006690.0, ans=0.2 2024-08-11 08:27:32,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.37 vs. limit=8.0 2024-08-11 08:27:42,410 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 08:27:54,334 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 08:27:58,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1006890.0, ans=0.125 2024-08-11 08:28:02,540 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.699e+01 3.024e+01 3.641e+01 8.253e+01, threshold=6.049e+01, percent-clipped=1.0 2024-08-11 08:28:13,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1006990.0, ans=0.125 2024-08-11 08:28:15,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13750, loss[loss=0.1133, beats_loss=0.01212, ecapa_loss=0.000184, whisper_loss=0.09936, over 14776.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01142, ecapa_loss=0.000205, whisper_loss=0.09417, over 3870751.59 frames. ], batch size: 57, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:28:34,749 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 08:28:43,111 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 08:28:59,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1007390.0, ans=0.125 2024-08-11 08:29:00,734 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 08:29:02,054 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 08:29:07,643 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 08:29:14,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1007490.0, ans=0.125 2024-08-11 08:29:30,269 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13800, loss[loss=0.09534, beats_loss=0.01409, ecapa_loss=0.0001802, whisper_loss=0.07945, over 22237.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01131, ecapa_loss=0.0002048, whisper_loss=0.09479, over 3870012.61 frames. ], batch size: 93, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:29:32,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1007590.0, ans=0.2 2024-08-11 08:29:38,530 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 08:29:40,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1007590.0, ans=0.125 2024-08-11 08:29:43,884 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 08:29:44,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-11 08:30:17,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-08-11 08:30:20,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1007890.0, ans=0.0 2024-08-11 08:30:28,513 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 08:30:31,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1007890.0, ans=0.05 2024-08-11 08:30:35,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.572e+01 2.803e+01 3.077e+01 5.296e+01, threshold=5.605e+01, percent-clipped=0.0 2024-08-11 08:30:49,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13850, loss[loss=0.09515, beats_loss=0.0101, ecapa_loss=0.0001936, whisper_loss=0.08312, over 13965.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01136, ecapa_loss=0.000203, whisper_loss=0.09466, over 3883934.43 frames. ], batch size: 55, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:30:50,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1008090.0, ans=0.125 2024-08-11 08:30:55,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1008090.0, ans=0.125 2024-08-11 08:30:59,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1008090.0, ans=0.125 2024-08-11 08:30:59,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1008090.0, ans=0.0 2024-08-11 08:31:18,744 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 08:31:24,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=15.0 2024-08-11 08:31:27,400 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:31:43,921 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 08:31:44,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1008390.0, ans=0.2 2024-08-11 08:31:48,670 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 08:32:02,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1008490.0, ans=0.1 2024-08-11 08:32:06,094 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 08:32:10,937 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13900, loss[loss=0.1124, beats_loss=0.01272, ecapa_loss=0.0001992, whisper_loss=0.09768, over 22402.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01136, ecapa_loss=0.0002034, whisper_loss=0.0943, over 3853801.54 frames. ], batch size: 92, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:32:29,367 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 08:32:31,192 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 08:32:32,309 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 08:32:58,395 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 08:32:58,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1008890.0, ans=0.125 2024-08-11 08:33:09,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1008890.0, ans=0.0 2024-08-11 08:33:14,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.194e+01 2.808e+01 3.104e+01 3.560e+01 5.037e+01, threshold=6.208e+01, percent-clipped=0.0 2024-08-11 08:33:19,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-08-11 08:33:21,695 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 08:33:28,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 13950, loss[loss=0.1026, beats_loss=0.00956, ecapa_loss=0.0002166, whisper_loss=0.09087, over 16247.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0114, ecapa_loss=0.0002027, whisper_loss=0.0938, over 3836624.70 frames. ], batch size: 65, lr: 8.70e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:33:49,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2024-08-11 08:34:12,563 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 08:34:16,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-11 08:34:28,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1009390.0, ans=0.125 2024-08-11 08:34:31,461 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 08:34:41,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1009490.0, ans=0.125 2024-08-11 08:34:48,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14000, loss[loss=0.1355, beats_loss=0.008405, ecapa_loss=0.0002168, whisper_loss=0.1249, over 19694.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01143, ecapa_loss=0.000202, whisper_loss=0.09384, over 3877345.68 frames. ], batch size: 78, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:34:48,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1009590.0, ans=0.1 2024-08-11 08:34:59,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1009590.0, ans=0.125 2024-08-11 08:35:08,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1009690.0, ans=0.0 2024-08-11 08:35:23,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1009790.0, ans=0.125 2024-08-11 08:35:41,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1009890.0, ans=0.125 2024-08-11 08:35:57,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.710e+01 3.006e+01 3.538e+01 6.784e+01, threshold=6.013e+01, percent-clipped=1.0 2024-08-11 08:36:12,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14050, loss[loss=0.08736, beats_loss=0.01125, ecapa_loss=0.0001699, whisper_loss=0.07441, over 15509.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002026, whisper_loss=0.09388, over 3867339.22 frames. ], batch size: 60, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:36:18,533 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 08:36:57,827 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 08:37:11,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1010390.0, ans=0.125 2024-08-11 08:37:37,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14100, loss[loss=0.09535, beats_loss=0.01357, ecapa_loss=0.0002031, whisper_loss=0.07974, over 21202.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01145, ecapa_loss=0.0002023, whisper_loss=0.09352, over 3857257.98 frames. ], batch size: 90, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:37:59,800 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.408e-01 2024-08-11 08:38:14,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-11 08:38:25,170 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-11 08:38:30,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-11 08:38:49,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.624e+01 2.945e+01 3.408e+01 4.744e+01, threshold=5.889e+01, percent-clipped=0.0 2024-08-11 08:38:56,412 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 08:39:05,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14150, loss[loss=0.09416, beats_loss=0.01141, ecapa_loss=0.0001747, whisper_loss=0.08101, over 15643.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01149, ecapa_loss=0.0002013, whisper_loss=0.09397, over 3857428.00 frames. ], batch size: 60, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:39:08,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=12.0 2024-08-11 08:39:17,839 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-11 08:39:19,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1011190.0, ans=0.2 2024-08-11 08:39:21,092 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 17 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 08:39:52,734 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 08:40:07,968 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 08:40:19,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011490.0, ans=0.1 2024-08-11 08:40:31,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14200, loss[loss=0.1149, beats_loss=0.01117, ecapa_loss=0.0001888, whisper_loss=0.1019, over 22797.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01145, ecapa_loss=0.0002013, whisper_loss=0.09393, over 3870782.65 frames. ], batch size: 88, lr: 8.69e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:40:37,593 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 08:41:12,053 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 08:41:17,224 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 08:41:25,346 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 08:41:29,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.32 vs. limit=15.0 2024-08-11 08:42:12,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.734e+01 3.043e+01 3.584e+01 5.331e+01, threshold=6.086e+01, percent-clipped=0.0 2024-08-11 08:42:26,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1011990.0, ans=0.2 2024-08-11 08:42:29,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14250, loss[loss=0.101, beats_loss=0.01128, ecapa_loss=0.0002201, whisper_loss=0.08755, over 15790.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01145, ecapa_loss=0.0002008, whisper_loss=0.09359, over 3891693.95 frames. ], batch size: 61, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:42:36,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1012090.0, ans=0.125 2024-08-11 08:43:17,652 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 08:43:20,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.64 vs. limit=10.0 2024-08-11 08:43:44,739 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 28 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 08:43:52,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-08-11 08:43:57,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14300, loss[loss=0.09851, beats_loss=0.01253, ecapa_loss=0.000201, whisper_loss=0.08396, over 21640.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01134, ecapa_loss=0.0002003, whisper_loss=0.09387, over 3889456.78 frames. ], batch size: 90, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:44:01,460 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 08:44:01,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1012590.0, ans=0.125 2024-08-11 08:44:04,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1012590.0, ans=0.125 2024-08-11 08:44:06,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1012590.0, ans=0.07 2024-08-11 08:44:19,402 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 08:44:28,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1012690.0, ans=10.0 2024-08-11 08:44:31,278 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 08:44:32,613 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 08:44:41,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.38 vs. limit=10.0 2024-08-11 08:44:47,583 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 08:44:57,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-08-11 08:45:05,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.111e+01 2.720e+01 3.044e+01 3.421e+01 5.497e+01, threshold=6.088e+01, percent-clipped=0.0 2024-08-11 08:45:07,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-11 08:45:15,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1012990.0, ans=0.125 2024-08-11 08:45:19,354 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14350, loss[loss=0.1172, beats_loss=0.009813, ecapa_loss=0.0002222, whisper_loss=0.1052, over 15962.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01132, ecapa_loss=0.0002001, whisper_loss=0.09368, over 3901752.49 frames. ], batch size: 63, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:45:20,003 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 08:45:21,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1013090.0, ans=0.1 2024-08-11 08:45:28,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-08-11 08:45:45,576 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 08:45:48,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2024-08-11 08:45:51,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1013290.0, ans=0.1 2024-08-11 08:45:58,408 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 08:46:38,184 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 08:46:41,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14400, loss[loss=0.1009, beats_loss=0.01346, ecapa_loss=0.0001934, whisper_loss=0.08553, over 21749.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01142, ecapa_loss=0.0002012, whisper_loss=0.0934, over 3906352.40 frames. ], batch size: 89, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:46:42,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1013590.0, ans=0.125 2024-08-11 08:46:58,727 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 08:47:46,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.704e+01 3.131e+01 3.618e+01 5.413e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 08:48:00,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 7, batch 14450, loss[loss=0.1319, beats_loss=0.008234, ecapa_loss=0.0002481, whisper_loss=0.1212, over 21161.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01139, ecapa_loss=0.0002019, whisper_loss=0.09365, over 3899805.11 frames. ], batch size: 86, lr: 8.68e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:48:08,943 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 08:48:19,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1014190.0, ans=0.1 2024-08-11 08:48:31,148 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-11 08:48:59,454 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-7.pt 2024-08-11 08:49:46,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 0, loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001787, whisper_loss=0.09028, over 18281.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01067, ecapa_loss=0.0001787, whisper_loss=0.09028, over 18281.00 frames. ], batch size: 68, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:49:46,119 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 08:50:29,084 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on ASR_libri: loss=0.2579, beats_loss=0, ecapa_loss=0.0006499, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 08:50:45,271 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on SV_voxceleb1: loss=0.005446, beats_loss=0, ecapa_loss=0.0005446, whisper_loss=0, over 939242.00 frames. 2024-08-11 08:52:49,606 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on AT_audioset: loss=0.02532, beats_loss=0.02532, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 08:52:49,615 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 08:52:56,969 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 08:53:20,284 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-11 08:54:01,223 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 08:54:24,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1014770.0, ans=0.0 2024-08-11 08:54:50,630 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:55:01,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.68 vs. limit=10.0 2024-08-11 08:55:05,095 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 50, loss[loss=0.1123, beats_loss=0.01169, ecapa_loss=0.0001617, whisper_loss=0.09897, over 24537.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01069, ecapa_loss=0.0002086, whisper_loss=0.09228, over 860606.44 frames. ], batch size: 93, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:55:07,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1014970.0, ans=0.125 2024-08-11 08:55:12,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.434e+01 2.926e+01 3.335e+01 3.829e+01 6.583e+01, threshold=6.671e+01, percent-clipped=1.0 2024-08-11 08:55:22,778 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=6.085e-02 2024-08-11 08:56:05,092 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 08:56:21,899 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 08:56:33,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1015270.0, ans=0.0 2024-08-11 08:56:40,542 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 08:56:59,247 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 08:57:04,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-11 08:57:07,105 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 100, loss[loss=0.09107, beats_loss=0.01089, ecapa_loss=0.0002099, whisper_loss=0.07808, over 21131.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01069, ecapa_loss=0.0002082, whisper_loss=0.09221, over 1531772.52 frames. ], batch size: 87, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:57:07,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1015470.0, ans=0.125 2024-08-11 08:57:23,267 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 08:57:54,581 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.667e-01 2024-08-11 08:58:09,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1015670.0, ans=0.125 2024-08-11 08:58:11,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1015670.0, ans=0.05 2024-08-11 08:58:24,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-08-11 08:58:37,376 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 08:58:43,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1015870.0, ans=0.125 2024-08-11 08:58:49,269 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 08:58:58,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 150, loss[loss=0.1088, beats_loss=0.01016, ecapa_loss=0.0001815, whisper_loss=0.09684, over 24103.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0002057, whisper_loss=0.0914, over 2033208.93 frames. ], batch size: 92, lr: 8.17e-03, grad_scale: 3.602879701896397e+16 2024-08-11 08:58:59,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1015970.0, ans=0.125 2024-08-11 08:59:02,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1015970.0, ans=0.125 2024-08-11 08:59:04,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.466e+01 2.999e+01 3.323e+01 3.859e+01 6.934e+01, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 08:59:09,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=15.0 2024-08-11 08:59:11,285 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-11 08:59:18,186 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 08:59:49,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-11 08:59:52,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1016270.0, ans=0.125 2024-08-11 09:00:08,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.0 2024-08-11 09:00:12,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016370.0, ans=0.1 2024-08-11 09:00:18,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1016370.0, ans=0.125 2024-08-11 09:00:24,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 200, loss[loss=0.1111, beats_loss=0.01063, ecapa_loss=0.0002002, whisper_loss=0.0985, over 21519.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01088, ecapa_loss=0.0002037, whisper_loss=0.09341, over 2452428.29 frames. ], batch size: 82, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:00:26,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1016470.0, ans=0.025 2024-08-11 09:00:53,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016570.0, ans=0.1 2024-08-11 09:01:30,634 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-11 09:01:35,305 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 09:01:40,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=15.0 2024-08-11 09:01:41,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1016870.0, ans=0.0 2024-08-11 09:01:44,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 250, loss[loss=0.1102, beats_loss=0.01061, ecapa_loss=0.0001869, whisper_loss=0.09769, over 16396.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01082, ecapa_loss=0.0002033, whisper_loss=0.09407, over 2721434.10 frames. ], batch size: 63, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:01:48,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.577e+01 2.891e+01 3.229e+01 6.128e+01, threshold=5.781e+01, percent-clipped=0.0 2024-08-11 09:01:54,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1016970.0, ans=0.125 2024-08-11 09:02:05,690 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 09:02:35,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-11 09:02:38,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1017270.0, ans=0.1 2024-08-11 09:02:47,347 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 09:03:01,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 300, loss[loss=0.1036, beats_loss=0.01139, ecapa_loss=0.000212, whisper_loss=0.09014, over 22307.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01081, ecapa_loss=0.000204, whisper_loss=0.09373, over 2942528.00 frames. ], batch size: 90, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:03:12,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1017470.0, ans=0.0 2024-08-11 09:03:36,105 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-11 09:03:40,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1017670.0, ans=0.1 2024-08-11 09:03:47,637 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-11 09:04:17,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 350, loss[loss=0.08332, beats_loss=0.01258, ecapa_loss=0.0002129, whisper_loss=0.0686, over 19686.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01094, ecapa_loss=0.0002039, whisper_loss=0.09303, over 3137007.62 frames. ], batch size: 81, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:04:22,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.490e+01 2.836e+01 3.239e+01 6.329e+01, threshold=5.671e+01, percent-clipped=2.0 2024-08-11 09:04:27,079 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 09:04:27,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1017970.0, ans=0.0 2024-08-11 09:04:34,119 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 09:04:34,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1018070.0, ans=0.125 2024-08-11 09:04:50,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1018170.0, ans=0.125 2024-08-11 09:04:54,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2024-08-11 09:05:17,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1018370.0, ans=0.0 2024-08-11 09:05:17,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1018370.0, ans=0.1 2024-08-11 09:05:28,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1018370.0, ans=0.05 2024-08-11 09:05:32,311 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 09:05:33,327 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 400, loss[loss=0.08458, beats_loss=0.0137, ecapa_loss=0.0001987, whisper_loss=0.06889, over 22465.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01104, ecapa_loss=0.0002029, whisper_loss=0.09292, over 3326328.02 frames. ], batch size: 94, lr: 8.16e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:05:40,557 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 09:05:43,578 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.998e-01 2024-08-11 09:05:44,540 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 09:06:16,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1018670.0, ans=0.125 2024-08-11 09:06:22,116 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 09:06:45,407 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:06:51,074 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 450, loss[loss=0.1116, beats_loss=0.01081, ecapa_loss=0.0002229, whisper_loss=0.09859, over 23510.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0002025, whisper_loss=0.0929, over 3436821.53 frames. ], batch size: 93, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:06:54,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1018970.0, ans=0.2 2024-08-11 09:06:55,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.612e+01 2.893e+01 3.369e+01 4.521e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-11 09:07:00,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1018970.0, ans=0.125 2024-08-11 09:07:00,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1018970.0, ans=0.0 2024-08-11 09:07:10,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1019070.0, ans=0.07 2024-08-11 09:07:12,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=12.0 2024-08-11 09:07:44,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019270.0, ans=0.1 2024-08-11 09:08:00,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1019370.0, ans=0.1 2024-08-11 09:08:09,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 500, loss[loss=0.1277, beats_loss=0.01164, ecapa_loss=0.0001891, whisper_loss=0.1142, over 17823.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01117, ecapa_loss=0.0002008, whisper_loss=0.09276, over 3526262.93 frames. ], batch size: 66, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:08:21,817 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 11 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 09:08:31,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019570.0, ans=0.1 2024-08-11 09:08:32,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1019570.0, ans=0.125 2024-08-11 09:08:37,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=15.0 2024-08-11 09:08:46,597 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-11 09:09:12,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1019770.0, ans=0.125 2024-08-11 09:09:15,725 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 09:09:30,877 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 09:09:32,376 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 550, loss[loss=0.1141, beats_loss=0.008277, ecapa_loss=0.0001968, whisper_loss=0.1038, over 15795.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01115, ecapa_loss=0.0001995, whisper_loss=0.09292, over 3588586.93 frames. ], batch size: 61, lr: 8.15e-03, grad_scale: 3.602879701896397e+16 2024-08-11 09:09:37,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.198e+01 2.649e+01 3.106e+01 3.487e+01 7.469e+01, threshold=6.212e+01, percent-clipped=4.0 2024-08-11 09:09:37,747 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 09:09:43,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019970.0, ans=0.1 2024-08-11 09:09:43,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1019970.0, ans=0.1 2024-08-11 09:09:44,801 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 09:10:06,545 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 09:10:15,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1020170.0, ans=0.1 2024-08-11 09:10:47,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 600, loss[loss=0.1002, beats_loss=0.01138, ecapa_loss=0.0002355, whisper_loss=0.08649, over 16606.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0111, ecapa_loss=0.0001977, whisper_loss=0.09321, over 3688303.43 frames. ], batch size: 70, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:10:57,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1020470.0, ans=0.0 2024-08-11 09:11:00,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2024-08-11 09:11:04,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2024-08-11 09:11:16,057 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-11 09:11:19,151 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 09:11:32,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1020670.0, ans=0.0 2024-08-11 09:11:42,734 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 09:12:01,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.89 vs. limit=22.5 2024-08-11 09:12:03,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1020870.0, ans=0.2 2024-08-11 09:12:06,089 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 650, loss[loss=0.09389, beats_loss=0.01493, ecapa_loss=0.0001401, whisper_loss=0.07756, over 18162.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0112, ecapa_loss=0.0001962, whisper_loss=0.09267, over 3689713.85 frames. ], batch size: 70, lr: 8.15e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:12:06,288 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 09:12:10,687 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.148e+01 2.651e+01 2.850e+01 3.204e+01 4.737e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 09:12:10,822 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 09:12:26,233 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 09:12:32,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021070.0, ans=0.1 2024-08-11 09:12:33,473 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 09:12:35,231 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 09:12:39,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1021170.0, ans=0.125 2024-08-11 09:12:39,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021170.0, ans=0.1 2024-08-11 09:12:44,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1021170.0, ans=0.125 2024-08-11 09:12:48,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-08-11 09:12:58,038 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 09:12:58,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1021270.0, ans=0.125 2024-08-11 09:13:17,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1021370.0, ans=0.125 2024-08-11 09:13:21,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 700, loss[loss=0.08467, beats_loss=0.009119, ecapa_loss=0.0002571, whisper_loss=0.07298, over 15512.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01119, ecapa_loss=0.0001973, whisper_loss=0.09218, over 3721701.84 frames. ], batch size: 63, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:13:25,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1021470.0, ans=0.2 2024-08-11 09:13:34,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1021470.0, ans=0.125 2024-08-11 09:13:42,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1021570.0, ans=0.125 2024-08-11 09:14:02,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1021670.0, ans=0.2 2024-08-11 09:14:26,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1021870.0, ans=0.125 2024-08-11 09:14:37,668 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 750, loss[loss=0.1006, beats_loss=0.01352, ecapa_loss=0.0001482, whisper_loss=0.08556, over 19859.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0113, ecapa_loss=0.000195, whisper_loss=0.09152, over 3735505.57 frames. ], batch size: 77, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:14:42,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.660e+01 3.127e+01 3.627e+01 6.783e+01, threshold=6.254e+01, percent-clipped=6.0 2024-08-11 09:14:44,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1021970.0, ans=0.0 2024-08-11 09:15:01,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1022070.0, ans=0.125 2024-08-11 09:15:19,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1022170.0, ans=0.0 2024-08-11 09:15:20,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1022170.0, ans=0.0 2024-08-11 09:15:33,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2024-08-11 09:15:42,476 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 09:15:51,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1022370.0, ans=0.1 2024-08-11 09:15:52,791 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 40 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-11 09:15:54,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 800, loss[loss=0.1521, beats_loss=0.009492, ecapa_loss=0.0001782, whisper_loss=0.1408, over 22464.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01135, ecapa_loss=0.0001951, whisper_loss=0.09111, over 3783018.35 frames. ], batch size: 80, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:16:15,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1022570.0, ans=0.0 2024-08-11 09:16:27,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1022670.0, ans=0.2 2024-08-11 09:16:28,172 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 09:16:34,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022670.0, ans=0.1 2024-08-11 09:16:45,879 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 09:16:54,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1022870.0, ans=0.1 2024-08-11 09:17:01,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1022870.0, ans=0.07 2024-08-11 09:17:07,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 850, loss[loss=0.1062, beats_loss=0.01055, ecapa_loss=0.0001662, whisper_loss=0.094, over 15712.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01134, ecapa_loss=0.0001945, whisper_loss=0.09111, over 3777994.37 frames. ], batch size: 59, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:17:11,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.661e+01 2.916e+01 3.361e+01 8.910e+01, threshold=5.831e+01, percent-clipped=1.0 2024-08-11 09:17:20,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1023070.0, ans=0.125 2024-08-11 09:17:40,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1023170.0, ans=15.0 2024-08-11 09:17:44,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1023170.0, ans=0.0 2024-08-11 09:17:55,659 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 09:18:10,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.74 vs. limit=22.5 2024-08-11 09:18:20,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1023470.0, ans=0.2 2024-08-11 09:18:21,904 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 900, loss[loss=0.09683, beats_loss=0.01249, ecapa_loss=0.0002311, whisper_loss=0.08204, over 19195.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01128, ecapa_loss=0.0001942, whisper_loss=0.09138, over 3789119.36 frames. ], batch size: 81, lr: 8.14e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:18:25,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1023470.0, ans=0.04949747468305833 2024-08-11 09:18:41,845 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 09:18:43,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1023570.0, ans=0.2 2024-08-11 09:19:03,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1023670.0, ans=0.1 2024-08-11 09:19:14,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1023770.0, ans=0.1 2024-08-11 09:19:15,770 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 09:19:16,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1023770.0, ans=0.2 2024-08-11 09:19:18,239 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-11 09:19:34,348 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.579e+05 2024-08-11 09:19:36,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 950, loss[loss=0.1185, beats_loss=0.00974, ecapa_loss=0.0001822, whisper_loss=0.107, over 18713.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01121, ecapa_loss=0.0001942, whisper_loss=0.09253, over 3784938.62 frames. ], batch size: 71, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:19:38,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1023970.0, ans=0.0 2024-08-11 09:19:40,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.622e+01 2.876e+01 3.425e+01 6.209e+01, threshold=5.753e+01, percent-clipped=1.0 2024-08-11 09:19:40,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1023970.0, ans=0.125 2024-08-11 09:19:41,835 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 09:19:55,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1024070.0, ans=0.2 2024-08-11 09:20:31,270 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 09:20:31,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1024270.0, ans=0.0 2024-08-11 09:20:33,536 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 09:20:50,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1024370.0, ans=0.0 2024-08-11 09:20:53,351 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 09:21:00,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1000, loss[loss=0.1008, beats_loss=0.01486, ecapa_loss=0.0001893, whisper_loss=0.08407, over 21891.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01132, ecapa_loss=0.0001922, whisper_loss=0.09196, over 3793507.46 frames. ], batch size: 90, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:21:06,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1024470.0, ans=0.125 2024-08-11 09:21:28,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1024570.0, ans=0.1 2024-08-11 09:21:38,881 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 09:21:45,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024670.0, ans=0.1 2024-08-11 09:21:57,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1024770.0, ans=0.1 2024-08-11 09:22:03,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1024770.0, ans=0.125 2024-08-11 09:22:09,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1024770.0, ans=0.125 2024-08-11 09:22:10,242 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 09:22:16,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-08-11 09:22:32,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1050, loss[loss=0.1222, beats_loss=0.01235, ecapa_loss=0.0001655, whisper_loss=0.1082, over 23225.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01129, ecapa_loss=0.0001934, whisper_loss=0.09184, over 3766295.61 frames. ], batch size: 90, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:22:39,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.754e+01 3.061e+01 3.548e+01 9.955e+01, threshold=6.122e+01, percent-clipped=1.0 2024-08-11 09:22:47,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024970.0, ans=0.1 2024-08-11 09:23:00,686 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 09:23:06,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2024-08-11 09:23:28,752 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 09:23:32,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1025170.0, ans=0.125 2024-08-11 09:23:36,941 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-11 09:24:12,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1025370.0, ans=0.2 2024-08-11 09:24:21,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1100, loss[loss=0.09747, beats_loss=0.01097, ecapa_loss=0.000213, whisper_loss=0.08438, over 21122.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0113, ecapa_loss=0.0001929, whisper_loss=0.09178, over 3780869.17 frames. ], batch size: 86, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:24:44,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1025570.0, ans=0.125 2024-08-11 09:24:47,592 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-11 09:24:53,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2024-08-11 09:25:24,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1025770.0, ans=0.125 2024-08-11 09:25:26,120 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 09:25:40,602 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 09:25:45,197 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 29 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 09:25:48,710 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 09:25:59,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1025870.0, ans=0.0 2024-08-11 09:26:05,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-11 09:26:08,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1150, loss[loss=0.1206, beats_loss=0.01001, ecapa_loss=0.0002317, whisper_loss=0.1082, over 18903.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01131, ecapa_loss=0.000193, whisper_loss=0.09193, over 3808745.40 frames. ], batch size: 77, lr: 8.13e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:26:14,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.696e+01 3.045e+01 3.408e+01 7.482e+01, threshold=6.090e+01, percent-clipped=2.0 2024-08-11 09:26:32,297 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 09:26:40,467 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 09:27:03,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1026170.0, ans=0.0 2024-08-11 09:27:25,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1026270.0, ans=0.125 2024-08-11 09:27:34,193 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:27:35,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1026370.0, ans=0.125 2024-08-11 09:27:42,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1026370.0, ans=0.07 2024-08-11 09:27:46,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1026370.0, ans=0.0 2024-08-11 09:27:54,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1200, loss[loss=0.1165, beats_loss=0.009918, ecapa_loss=0.0002042, whisper_loss=0.1045, over 14800.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01124, ecapa_loss=0.0001944, whisper_loss=0.09231, over 3837539.24 frames. ], batch size: 55, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:28:18,317 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 09:28:18,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2024-08-11 09:28:39,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1026670.0, ans=0.2 2024-08-11 09:28:41,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1026670.0, ans=0.07 2024-08-11 09:28:44,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-08-11 09:29:12,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1250, loss[loss=0.09279, beats_loss=0.0145, ecapa_loss=0.0001386, whisper_loss=0.07691, over 14651.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01129, ecapa_loss=0.0001944, whisper_loss=0.0918, over 3811742.16 frames. ], batch size: 56, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:29:17,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.549e+01 2.780e+01 3.273e+01 6.263e+01, threshold=5.560e+01, percent-clipped=1.0 2024-08-11 09:29:21,676 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 09:29:43,657 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 09:30:13,419 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 09:30:27,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1300, loss[loss=0.09886, beats_loss=0.009228, ecapa_loss=0.0002124, whisper_loss=0.08751, over 18035.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01132, ecapa_loss=0.0001931, whisper_loss=0.09162, over 3788991.95 frames. ], batch size: 70, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:30:27,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1027470.0, ans=0.125 2024-08-11 09:30:36,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1027470.0, ans=0.0 2024-08-11 09:30:51,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1027570.0, ans=0.025 2024-08-11 09:31:05,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1027670.0, ans=0.0 2024-08-11 09:31:18,232 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 09:31:22,410 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-11 09:31:30,113 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-11 09:31:44,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1350, loss[loss=0.09915, beats_loss=0.01241, ecapa_loss=0.0002108, whisper_loss=0.08463, over 15523.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01133, ecapa_loss=0.0001919, whisper_loss=0.09146, over 3783605.25 frames. ], batch size: 62, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:31:49,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.558e+01 2.922e+01 3.559e+01 4.960e+01, threshold=5.843e+01, percent-clipped=0.0 2024-08-11 09:32:29,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1028270.0, ans=0.125 2024-08-11 09:32:51,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1028370.0, ans=0.0 2024-08-11 09:32:53,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.26 vs. limit=10.0 2024-08-11 09:32:58,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1028470.0, ans=0.0 2024-08-11 09:32:59,707 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1400, loss[loss=0.08797, beats_loss=0.01299, ecapa_loss=0.000212, whisper_loss=0.07285, over 19287.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01132, ecapa_loss=0.0001908, whisper_loss=0.09152, over 3789891.60 frames. ], batch size: 83, lr: 8.12e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:33:28,168 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 09:33:28,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2024-08-11 09:33:33,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1028670.0, ans=0.125 2024-08-11 09:33:47,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-11 09:34:02,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1028870.0, ans=0.2 2024-08-11 09:34:07,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1028870.0, ans=0.05 2024-08-11 09:34:10,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-08-11 09:34:28,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1450, loss[loss=0.1205, beats_loss=0.009107, ecapa_loss=0.0001755, whisper_loss=0.1097, over 21105.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01123, ecapa_loss=0.0001912, whisper_loss=0.09195, over 3781274.71 frames. ], batch size: 80, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:34:28,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1028970.0, ans=0.125 2024-08-11 09:34:33,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.516e+01 2.871e+01 3.149e+01 4.386e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 09:34:33,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1028970.0, ans=0.2 2024-08-11 09:34:33,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1028970.0, ans=0.2 2024-08-11 09:34:43,609 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:34:52,754 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-11 09:35:02,340 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 09:35:10,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1029170.0, ans=0.125 2024-08-11 09:35:45,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.22 vs. limit=22.5 2024-08-11 09:35:48,022 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1500, loss[loss=0.09135, beats_loss=0.009956, ecapa_loss=0.0002593, whisper_loss=0.0788, over 15391.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01127, ecapa_loss=0.0001902, whisper_loss=0.09174, over 3791907.60 frames. ], batch size: 64, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:35:51,957 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 09:35:58,606 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.532e-02 2024-08-11 09:36:05,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-08-11 09:36:19,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1029670.0, ans=0.0 2024-08-11 09:36:27,363 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-11 09:36:29,222 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 09:36:45,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-11 09:36:50,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1029870.0, ans=0.125 2024-08-11 09:36:52,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1029870.0, ans=0.1 2024-08-11 09:37:07,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1550, loss[loss=0.1154, beats_loss=0.01124, ecapa_loss=0.0001874, whisper_loss=0.1023, over 14290.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01131, ecapa_loss=0.0001892, whisper_loss=0.09192, over 3799239.55 frames. ], batch size: 57, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:37:10,882 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 09:37:11,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.727e+01 2.976e+01 3.507e+01 6.642e+01, threshold=5.952e+01, percent-clipped=2.0 2024-08-11 09:37:12,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1029970.0, ans=0.1 2024-08-11 09:37:18,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-08-11 09:37:24,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.83 vs. limit=22.5 2024-08-11 09:37:36,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1030070.0, ans=0.125 2024-08-11 09:38:08,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1030270.0, ans=0.125 2024-08-11 09:38:18,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-08-11 09:38:26,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1600, loss[loss=0.1167, beats_loss=0.01099, ecapa_loss=0.000193, whisper_loss=0.1038, over 22514.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01127, ecapa_loss=0.0001899, whisper_loss=0.09173, over 3814755.52 frames. ], batch size: 89, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:38:28,443 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 09:38:30,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-11 09:38:32,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-08-11 09:38:36,518 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 09:38:58,359 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 09:39:04,160 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 09:39:24,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1030770.0, ans=0.0 2024-08-11 09:39:30,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.26 vs. limit=10.0 2024-08-11 09:39:34,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1030870.0, ans=0.125 2024-08-11 09:39:34,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.45 vs. limit=10.0 2024-08-11 09:39:40,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1030870.0, ans=0.0 2024-08-11 09:39:43,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1650, loss[loss=0.1122, beats_loss=0.009725, ecapa_loss=0.0002289, whisper_loss=0.1002, over 20663.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01133, ecapa_loss=0.0001903, whisper_loss=0.09235, over 3850982.65 frames. ], batch size: 82, lr: 8.11e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:39:47,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1030970.0, ans=0.125 2024-08-11 09:39:47,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2024-08-11 09:39:48,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.611e+01 2.904e+01 3.448e+01 5.228e+01, threshold=5.808e+01, percent-clipped=0.0 2024-08-11 09:39:53,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1030970.0, ans=0.0 2024-08-11 09:39:56,093 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 09:40:05,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2024-08-11 09:40:13,259 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 09:40:16,012 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 09:40:21,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-11 09:40:30,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1031270.0, ans=0.125 2024-08-11 09:40:31,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1031270.0, ans=0.2 2024-08-11 09:40:49,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 2024-08-11 09:40:53,432 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 09:40:53,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1031370.0, ans=0.04949747468305833 2024-08-11 09:40:57,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1700, loss[loss=0.1061, beats_loss=0.01302, ecapa_loss=0.0001738, whisper_loss=0.09133, over 22804.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01129, ecapa_loss=0.0001891, whisper_loss=0.09311, over 3855500.24 frames. ], batch size: 92, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:41:07,967 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 09:41:13,975 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 09:41:18,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1031570.0, ans=0.1 2024-08-11 09:41:23,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1031570.0, ans=0.0 2024-08-11 09:41:36,562 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-11 09:41:42,189 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 09:41:50,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2024-08-11 09:42:02,589 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 09:42:09,262 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1750, loss[loss=0.1084, beats_loss=0.01106, ecapa_loss=0.0001758, whisper_loss=0.09561, over 15251.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01126, ecapa_loss=0.0001895, whisper_loss=0.09314, over 3852951.68 frames. ], batch size: 59, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:42:13,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.694e+01 3.096e+01 3.648e+01 5.495e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 09:42:13,585 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 09:42:22,360 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 09:42:46,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2024-08-11 09:42:50,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1032170.0, ans=0.125 2024-08-11 09:42:51,418 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 09:43:02,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1032270.0, ans=0.0 2024-08-11 09:43:16,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1032370.0, ans=0.125 2024-08-11 09:43:21,917 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1800, loss[loss=0.1251, beats_loss=0.01148, ecapa_loss=0.0001727, whisper_loss=0.1119, over 17893.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01122, ecapa_loss=0.0001891, whisper_loss=0.09354, over 3878176.73 frames. ], batch size: 68, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:43:24,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-11 09:43:36,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1032570.0, ans=0.2 2024-08-11 09:43:51,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-11 09:44:26,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1032870.0, ans=0.1 2024-08-11 09:44:35,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1850, loss[loss=0.09153, beats_loss=0.009517, ecapa_loss=0.0002266, whisper_loss=0.07975, over 16320.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01119, ecapa_loss=0.0001907, whisper_loss=0.09304, over 3864691.66 frames. ], batch size: 64, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:44:37,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1032970.0, ans=0.125 2024-08-11 09:44:39,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.564e+01 2.931e+01 3.381e+01 4.621e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-11 09:44:50,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1033070.0, ans=15.0 2024-08-11 09:45:15,727 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 10 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 09:45:41,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-08-11 09:45:47,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1900, loss[loss=0.08547, beats_loss=0.016, ecapa_loss=9.973e-05, whisper_loss=0.06847, over 14344.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01131, ecapa_loss=0.0001921, whisper_loss=0.0922, over 3853797.98 frames. ], batch size: 53, lr: 8.10e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:45:50,192 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 09:46:01,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1033570.0, ans=0.125 2024-08-11 09:46:08,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-11 09:46:09,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-11 09:46:20,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1033670.0, ans=0.5 2024-08-11 09:46:38,604 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 09:46:44,549 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 09:46:46,364 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 09:46:50,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1033870.0, ans=0.0 2024-08-11 09:47:00,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 1950, loss[loss=0.1153, beats_loss=0.009763, ecapa_loss=0.0002028, whisper_loss=0.1035, over 18371.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.000194, whisper_loss=0.09269, over 3844316.48 frames. ], batch size: 71, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:47:03,959 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 09:47:05,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.682e+01 2.998e+01 3.589e+01 5.098e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 09:47:07,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1033970.0, ans=0.125 2024-08-11 09:47:11,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1033970.0, ans=0.125 2024-08-11 09:47:26,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1034070.0, ans=0.07 2024-08-11 09:47:37,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2024-08-11 09:47:41,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1034170.0, ans=0.0 2024-08-11 09:47:43,903 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-11 09:48:02,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1034370.0, ans=0.125 2024-08-11 09:48:13,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2000, loss[loss=0.09177, beats_loss=0.01281, ecapa_loss=0.0001958, whisper_loss=0.077, over 23457.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01122, ecapa_loss=0.000196, whisper_loss=0.09245, over 3810282.88 frames. ], batch size: 92, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:48:17,760 WARNING [optim.py:496] (0/4) Scaling gradients by 0.059571195393800735, model_norm_threshold=59.96577072143555 2024-08-11 09:48:17,964 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.97, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.877e+05, grad_sumsq=1.108e+05, orig_rms_sq=8.917e+00 2024-08-11 09:48:29,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1034570.0, ans=0.125 2024-08-11 09:48:33,660 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 09:49:13,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1034870.0, ans=10.0 2024-08-11 09:49:16,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2024-08-11 09:49:20,639 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 09:49:22,001 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 09:49:27,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2050, loss[loss=0.1166, beats_loss=0.008583, ecapa_loss=0.0002586, whisper_loss=0.1055, over 21254.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01123, ecapa_loss=0.0001971, whisper_loss=0.09246, over 3789499.30 frames. ], batch size: 87, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:49:31,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.681e+01 2.944e+01 3.350e+01 1.007e+03, threshold=5.888e+01, percent-clipped=2.0 2024-08-11 09:49:38,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1034970.0, ans=0.0 2024-08-11 09:49:46,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1035070.0, ans=0.125 2024-08-11 09:50:00,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1035170.0, ans=0.125 2024-08-11 09:50:13,245 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 09:50:17,776 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 09:50:26,178 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 09:50:40,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2100, loss[loss=0.09508, beats_loss=0.01067, ecapa_loss=0.0001872, whisper_loss=0.08254, over 16206.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01129, ecapa_loss=0.0001979, whisper_loss=0.09252, over 3814839.84 frames. ], batch size: 65, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:50:41,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1035470.0, ans=0.125 2024-08-11 09:50:52,705 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 09:51:11,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035670.0, ans=0.1 2024-08-11 09:51:31,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1035770.0, ans=0.0 2024-08-11 09:51:37,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=10.0 2024-08-11 09:51:38,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1035870.0, ans=0.0 2024-08-11 09:51:40,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1035870.0, ans=0.125 2024-08-11 09:51:50,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1035870.0, ans=0.125 2024-08-11 09:51:54,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2150, loss[loss=0.1171, beats_loss=0.01022, ecapa_loss=0.0002359, whisper_loss=0.1046, over 21871.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01129, ecapa_loss=0.0001968, whisper_loss=0.09292, over 3829384.02 frames. ], batch size: 92, lr: 8.09e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:51:58,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.546e+01 2.848e+01 3.381e+01 6.507e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 09:52:09,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1036070.0, ans=0.2 2024-08-11 09:52:20,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1036070.0, ans=0.0 2024-08-11 09:52:22,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2024-08-11 09:52:25,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2024-08-11 09:52:33,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1036170.0, ans=0.1 2024-08-11 09:52:36,484 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 09:52:53,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1036370.0, ans=0.0 2024-08-11 09:53:06,853 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2200, loss[loss=0.09552, beats_loss=0.01429, ecapa_loss=0.0001545, whisper_loss=0.07968, over 20854.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01131, ecapa_loss=0.0001979, whisper_loss=0.09262, over 3810525.40 frames. ], batch size: 83, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:53:07,032 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 13 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 09:53:17,151 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-11 09:53:21,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1036570.0, ans=0.1 2024-08-11 09:53:29,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1036570.0, ans=0.09899494936611666 2024-08-11 09:53:47,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1036770.0, ans=0.2 2024-08-11 09:53:49,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1036770.0, ans=0.2 2024-08-11 09:54:12,268 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.594e-03 2024-08-11 09:54:15,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2250, loss[loss=0.1142, beats_loss=0.00737, ecapa_loss=0.0001912, whisper_loss=0.1049, over 16278.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0113, ecapa_loss=0.0001994, whisper_loss=0.09294, over 3791140.27 frames. ], batch size: 59, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:54:19,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.681e+01 2.914e+01 3.367e+01 5.391e+01, threshold=5.828e+01, percent-clipped=0.0 2024-08-11 09:54:19,995 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.982e-02 2024-08-11 09:54:20,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-11 09:54:57,036 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 09:55:02,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1037270.0, ans=0.125 2024-08-11 09:55:04,666 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 09:55:14,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-11 09:55:20,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1037470.0, ans=0.125 2024-08-11 09:55:21,554 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2300, loss[loss=0.1259, beats_loss=0.01079, ecapa_loss=0.0002223, whisper_loss=0.1129, over 22616.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01127, ecapa_loss=0.0001997, whisper_loss=0.09351, over 3832762.93 frames. ], batch size: 90, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:55:33,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1037470.0, ans=0.0 2024-08-11 09:55:39,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1037570.0, ans=0.125 2024-08-11 09:55:45,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-11 09:55:53,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1037670.0, ans=0.2 2024-08-11 09:56:01,992 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 09:56:13,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-11 09:56:17,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2024-08-11 09:56:27,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2350, loss[loss=0.1084, beats_loss=0.01316, ecapa_loss=0.0001795, whisper_loss=0.09342, over 22292.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01133, ecapa_loss=0.0001997, whisper_loss=0.09342, over 3851642.84 frames. ], batch size: 90, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:56:30,472 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 09:56:31,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.661e+01 3.016e+01 3.402e+01 1.211e+02, threshold=6.032e+01, percent-clipped=3.0 2024-08-11 09:56:37,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1037970.0, ans=0.0 2024-08-11 09:56:50,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1038070.0, ans=0.125 2024-08-11 09:57:07,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=15.0 2024-08-11 09:57:30,390 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 09:57:33,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2400, loss[loss=0.09899, beats_loss=0.01339, ecapa_loss=0.0001752, whisper_loss=0.08385, over 21892.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01127, ecapa_loss=0.0002006, whisper_loss=0.09345, over 3856588.47 frames. ], batch size: 90, lr: 8.08e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:57:40,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1038470.0, ans=0.0 2024-08-11 09:57:46,156 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 09:58:09,917 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 09:58:15,743 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 09:58:16,848 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 09:58:18,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1038770.0, ans=0.125 2024-08-11 09:58:19,507 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-11 09:58:22,216 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 09:58:27,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-11 09:58:32,553 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 33 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 09:58:35,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.16 vs. limit=22.5 2024-08-11 09:58:39,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2450, loss[loss=0.1111, beats_loss=0.01042, ecapa_loss=0.0002157, whisper_loss=0.09853, over 16569.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01126, ecapa_loss=0.0002007, whisper_loss=0.09333, over 3878271.95 frames. ], batch size: 65, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:58:43,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.701e+01 2.979e+01 3.423e+01 5.204e+01, threshold=5.958e+01, percent-clipped=0.0 2024-08-11 09:58:43,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1038970.0, ans=0.0 2024-08-11 09:58:49,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1038970.0, ans=0.1 2024-08-11 09:59:00,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1039070.0, ans=0.5 2024-08-11 09:59:22,643 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 14 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 09:59:37,846 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 09:59:44,200 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2500, loss[loss=0.1087, beats_loss=0.01181, ecapa_loss=0.000168, whisper_loss=0.09522, over 18469.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01122, ecapa_loss=0.000201, whisper_loss=0.09344, over 3876026.32 frames. ], batch size: 71, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 09:59:45,644 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 10:00:12,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1039670.0, ans=0.1 2024-08-11 10:00:12,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1039670.0, ans=0.0 2024-08-11 10:00:18,513 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 10:00:28,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-11 10:00:34,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1039770.0, ans=0.0 2024-08-11 10:00:38,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1039870.0, ans=0.0 2024-08-11 10:00:40,717 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 10:00:46,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1039870.0, ans=0.125 2024-08-11 10:00:49,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2550, loss[loss=0.1018, beats_loss=0.01303, ecapa_loss=0.0001985, whisper_loss=0.08677, over 20444.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01114, ecapa_loss=0.0002013, whisper_loss=0.0938, over 3896622.88 frames. ], batch size: 84, lr: 8.07e-03, grad_scale: 7.205759403792794e+16 2024-08-11 10:00:53,155 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-104000.pt 2024-08-11 10:00:57,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.767e+01 3.292e+01 3.693e+01 5.376e+01, threshold=6.584e+01, percent-clipped=0.0 2024-08-11 10:00:59,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1039970.0, ans=0.125 2024-08-11 10:01:11,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1040070.0, ans=0.05 2024-08-11 10:01:13,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1040070.0, ans=0.125 2024-08-11 10:01:17,107 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-11 10:01:21,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1040170.0, ans=0.025 2024-08-11 10:01:31,380 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 10:01:53,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-11 10:01:56,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-11 10:01:59,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2600, loss[loss=0.1014, beats_loss=0.008206, ecapa_loss=0.0002452, whisper_loss=0.09079, over 14514.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01118, ecapa_loss=0.0002018, whisper_loss=0.09346, over 3890077.07 frames. ], batch size: 59, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:02:05,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1040470.0, ans=0.125 2024-08-11 10:02:08,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1040470.0, ans=0.125 2024-08-11 10:02:11,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1040570.0, ans=0.125 2024-08-11 10:02:25,939 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 10:02:29,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-08-11 10:02:32,668 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 10:02:41,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1040770.0, ans=0.0 2024-08-11 10:02:45,635 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 10:02:50,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.83 vs. limit=10.0 2024-08-11 10:03:02,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1040870.0, ans=0.125 2024-08-11 10:03:05,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2650, loss[loss=0.1083, beats_loss=0.0107, ecapa_loss=0.0002074, whisper_loss=0.09548, over 22437.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01117, ecapa_loss=0.0002015, whisper_loss=0.09356, over 3897620.63 frames. ], batch size: 89, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:03:09,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-11 10:03:09,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+01 2.707e+01 2.925e+01 3.318e+01 6.568e+01, threshold=5.849e+01, percent-clipped=0.0 2024-08-11 10:03:32,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.67 vs. limit=22.5 2024-08-11 10:03:38,747 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 10:03:41,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1041170.0, ans=0.0 2024-08-11 10:03:42,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1041170.0, ans=0.0 2024-08-11 10:03:48,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1041270.0, ans=0.125 2024-08-11 10:03:49,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1041270.0, ans=0.035 2024-08-11 10:04:11,366 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2700, loss[loss=0.1099, beats_loss=0.01362, ecapa_loss=0.0001855, whisper_loss=0.09444, over 17693.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01122, ecapa_loss=0.0001999, whisper_loss=0.09334, over 3875155.40 frames. ], batch size: 68, lr: 8.07e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:04:18,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1041470.0, ans=0.0 2024-08-11 10:04:21,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1041470.0, ans=0.0 2024-08-11 10:04:25,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1041570.0, ans=0.125 2024-08-11 10:04:27,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1041570.0, ans=0.2 2024-08-11 10:04:41,283 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 10:04:44,877 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 10:04:47,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1041670.0, ans=0.125 2024-08-11 10:04:47,744 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-11 10:04:55,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1041770.0, ans=0.125 2024-08-11 10:05:01,001 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 10:05:01,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1041770.0, ans=0.2 2024-08-11 10:05:17,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1041970.0, ans=0.1 2024-08-11 10:05:18,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2750, loss[loss=0.09486, beats_loss=0.009634, ecapa_loss=0.000215, whisper_loss=0.08307, over 20247.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01125, ecapa_loss=0.0001998, whisper_loss=0.09314, over 3877367.17 frames. ], batch size: 81, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:05:22,010 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.665e+01 2.980e+01 3.281e+01 5.234e+01, threshold=5.959e+01, percent-clipped=0.0 2024-08-11 10:05:30,179 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 10:05:31,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-08-11 10:06:06,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=12.0 2024-08-11 10:06:15,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2024-08-11 10:06:24,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2800, loss[loss=0.1318, beats_loss=0.008268, ecapa_loss=0.0001743, whisper_loss=0.1218, over 16138.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01138, ecapa_loss=0.0001986, whisper_loss=0.093, over 3900898.56 frames. ], batch size: 58, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:06:25,502 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 10:06:25,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1042470.0, ans=0.125 2024-08-11 10:06:26,695 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 10:06:27,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1042470.0, ans=0.125 2024-08-11 10:06:36,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1042570.0, ans=0.0 2024-08-11 10:06:47,773 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 10:06:58,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1042670.0, ans=0.125 2024-08-11 10:06:59,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2024-08-11 10:06:59,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1042670.0, ans=0.0 2024-08-11 10:07:05,224 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 10:07:08,990 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 10:07:13,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1042770.0, ans=0.125 2024-08-11 10:07:26,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1042870.0, ans=0.2 2024-08-11 10:07:28,486 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 10:07:29,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2850, loss[loss=0.1046, beats_loss=0.01239, ecapa_loss=0.0001658, whisper_loss=0.09054, over 23655.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0114, ecapa_loss=0.0001982, whisper_loss=0.09331, over 3897357.54 frames. ], batch size: 94, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:07:33,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.749e+01 2.990e+01 3.438e+01 5.063e+01, threshold=5.981e+01, percent-clipped=0.0 2024-08-11 10:07:37,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1042970.0, ans=0.125 2024-08-11 10:07:42,993 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 10:07:58,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1043170.0, ans=0.125 2024-08-11 10:08:01,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1043170.0, ans=0.0 2024-08-11 10:08:01,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2024-08-11 10:08:05,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1043170.0, ans=0.125 2024-08-11 10:08:10,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1043270.0, ans=0.125 2024-08-11 10:08:11,776 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 10:08:29,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1043370.0, ans=0.2 2024-08-11 10:08:35,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2900, loss[loss=0.108, beats_loss=0.007937, ecapa_loss=0.0001799, whisper_loss=0.09822, over 14779.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01143, ecapa_loss=0.000199, whisper_loss=0.09353, over 3896782.48 frames. ], batch size: 53, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:08:49,430 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 10:09:07,865 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 10:09:09,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1043670.0, ans=0.1 2024-08-11 10:09:12,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1043670.0, ans=0.2 2024-08-11 10:09:13,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-11 10:09:42,035 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 2950, loss[loss=0.108, beats_loss=0.0112, ecapa_loss=0.0001699, whisper_loss=0.09509, over 20087.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01148, ecapa_loss=0.0001987, whisper_loss=0.09349, over 3903986.11 frames. ], batch size: 76, lr: 8.06e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:09:45,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+01 2.608e+01 2.908e+01 3.326e+01 5.190e+01, threshold=5.815e+01, percent-clipped=0.0 2024-08-11 10:09:47,429 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 10:09:53,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-11 10:09:59,459 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 10:10:03,431 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 10:10:07,397 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 10:10:09,990 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 10:10:36,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1044370.0, ans=0.0 2024-08-11 10:10:44,348 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 10:10:47,947 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3000, loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0002213, whisper_loss=0.08942, over 17191.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01137, ecapa_loss=0.0002003, whisper_loss=0.09381, over 3897723.24 frames. ], batch size: 70, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:10:47,948 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 10:11:27,186 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006456, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 10:11:45,396 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on SV_voxceleb1: loss=0.005368, beats_loss=0, ecapa_loss=0.0005368, whisper_loss=0, over 939242.00 frames. 2024-08-11 10:12:43,757 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.7494e-09, 4.4911e-02, 5.0476e-03, 2.6008e-02, 2.0691e-03, 7.0758e-02, 6.0355e-02, 4.1158e-02], device='cuda:0') 2024-08-11 10:13:42,423 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on AT_audioset: loss=0.02512, beats_loss=0.02512, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 10:13:42,428 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 10:14:08,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=15.0 2024-08-11 10:14:10,591 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 10:14:13,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1044670.0, ans=0.07 2024-08-11 10:14:20,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1044670.0, ans=0.2 2024-08-11 10:14:26,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1044770.0, ans=0.0 2024-08-11 10:14:32,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1044770.0, ans=0.125 2024-08-11 10:14:49,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3050, loss[loss=0.1177, beats_loss=0.01031, ecapa_loss=0.0001751, whisper_loss=0.1056, over 22885.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01137, ecapa_loss=0.0001995, whisper_loss=0.09385, over 3897787.05 frames. ], batch size: 87, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:14:50,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1044970.0, ans=0.0 2024-08-11 10:14:53,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.751e+01 3.093e+01 3.441e+01 4.563e+01, threshold=6.185e+01, percent-clipped=0.0 2024-08-11 10:15:02,037 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 13 from Vox, 50 fro AS 2024-08-11 10:15:06,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1045070.0, ans=0.125 2024-08-11 10:15:15,139 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 10:15:33,974 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 10:15:35,262 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 10:15:46,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1045370.0, ans=0.125 2024-08-11 10:15:47,573 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 10:15:56,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3100, loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001754, whisper_loss=0.09076, over 23358.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01145, ecapa_loss=0.0001994, whisper_loss=0.09369, over 3885563.42 frames. ], batch size: 91, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:15:56,678 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 10:16:21,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1045670.0, ans=0.0 2024-08-11 10:16:48,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1045870.0, ans=0.125 2024-08-11 10:16:51,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1045870.0, ans=0.1 2024-08-11 10:17:00,458 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0893079861998558, model_norm_threshold=61.852699279785156 2024-08-11 10:17:00,619 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.682e+05, grad_sumsq=5.221e+04, orig_rms_sq=8.968e+00 2024-08-11 10:17:03,153 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3150, loss[loss=0.1067, beats_loss=0.009957, ecapa_loss=0.0002169, whisper_loss=0.09453, over 21534.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01146, ecapa_loss=0.0002, whisper_loss=0.09371, over 3880838.03 frames. ], batch size: 88, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:17:07,431 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.851e+01 3.278e+01 3.632e+01 6.926e+02, threshold=6.555e+01, percent-clipped=1.0 2024-08-11 10:17:18,388 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 10:17:27,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1046070.0, ans=0.04949747468305833 2024-08-11 10:17:36,903 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 10:18:08,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1046470.0, ans=0.0 2024-08-11 10:18:09,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3200, loss[loss=0.1357, beats_loss=0.009525, ecapa_loss=0.0001882, whisper_loss=0.1243, over 23313.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01144, ecapa_loss=0.0002012, whisper_loss=0.09408, over 3870033.98 frames. ], batch size: 89, lr: 8.05e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:18:12,824 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 10:18:15,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.55 vs. limit=15.0 2024-08-11 10:18:43,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-11 10:18:47,092 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 10:18:51,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1046770.0, ans=0.125 2024-08-11 10:18:59,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1046770.0, ans=0.0 2024-08-11 10:19:07,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1046870.0, ans=0.0 2024-08-11 10:19:08,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2024-08-11 10:19:16,392 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3250, loss[loss=0.09379, beats_loss=0.01113, ecapa_loss=0.0001967, whisper_loss=0.0807, over 13996.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01141, ecapa_loss=0.0002018, whisper_loss=0.09383, over 3843927.34 frames. ], batch size: 56, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:19:20,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.129e+01 2.734e+01 3.207e+01 3.832e+01 6.451e+01, threshold=6.414e+01, percent-clipped=0.0 2024-08-11 10:19:20,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046970.0, ans=0.1 2024-08-11 10:19:37,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1047070.0, ans=0.0 2024-08-11 10:19:42,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1047170.0, ans=0.1 2024-08-11 10:20:01,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1047270.0, ans=0.125 2024-08-11 10:20:16,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1047370.0, ans=0.0 2024-08-11 10:20:20,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1047370.0, ans=0.0 2024-08-11 10:20:22,944 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3300, loss[loss=0.08703, beats_loss=0.008395, ecapa_loss=0.0001888, whisper_loss=0.07674, over 15101.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01144, ecapa_loss=0.0002015, whisper_loss=0.09368, over 3846743.08 frames. ], batch size: 57, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:20:32,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1047470.0, ans=0.125 2024-08-11 10:20:58,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1047670.0, ans=0.125 2024-08-11 10:21:11,931 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 10:21:30,262 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3350, loss[loss=0.1005, beats_loss=0.01195, ecapa_loss=0.0001922, whisper_loss=0.08661, over 19354.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01129, ecapa_loss=0.0002011, whisper_loss=0.09342, over 3834576.25 frames. ], batch size: 76, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:21:32,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.79 vs. limit=22.5 2024-08-11 10:21:34,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+01 2.767e+01 3.123e+01 3.740e+01 5.333e+01, threshold=6.246e+01, percent-clipped=0.0 2024-08-11 10:21:55,886 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 10:22:01,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-11 10:22:02,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1048170.0, ans=0.04949747468305833 2024-08-11 10:22:19,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1048270.0, ans=0.0 2024-08-11 10:22:20,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=12.0 2024-08-11 10:22:37,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3400, loss[loss=0.1031, beats_loss=0.01372, ecapa_loss=0.000176, whisper_loss=0.08764, over 18157.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0114, ecapa_loss=0.0002014, whisper_loss=0.09266, over 3874450.67 frames. ], batch size: 71, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:22:37,819 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.596e-02 2024-08-11 10:23:00,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2024-08-11 10:23:03,363 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 10:23:11,111 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 10:23:11,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1048670.0, ans=0.125 2024-08-11 10:23:14,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1048670.0, ans=0.2 2024-08-11 10:23:17,251 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 10:23:40,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-11 10:23:46,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3450, loss[loss=0.121, beats_loss=0.01138, ecapa_loss=0.0001722, whisper_loss=0.1079, over 22141.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01144, ecapa_loss=0.0002007, whisper_loss=0.09223, over 3906170.83 frames. ], batch size: 86, lr: 8.04e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:23:50,563 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.572e+01 2.937e+01 3.389e+01 1.105e+02, threshold=5.874e+01, percent-clipped=1.0 2024-08-11 10:23:51,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1048970.0, ans=0.125 2024-08-11 10:24:05,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1049070.0, ans=0.125 2024-08-11 10:24:11,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.92 vs. limit=22.5 2024-08-11 10:24:16,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1049170.0, ans=0.125 2024-08-11 10:24:30,763 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 10:24:36,362 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 10:24:41,911 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 10:24:55,041 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3500, loss[loss=0.1068, beats_loss=0.01142, ecapa_loss=0.0001709, whisper_loss=0.09367, over 18372.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01144, ecapa_loss=0.0002014, whisper_loss=0.0927, over 3927251.78 frames. ], batch size: 69, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:24:59,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1049470.0, ans=0.2 2024-08-11 10:25:38,721 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 10:25:44,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1049770.0, ans=0.2 2024-08-11 10:25:47,105 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 10:25:48,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1049870.0, ans=0.0 2024-08-11 10:25:52,547 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 10:25:56,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1049870.0, ans=0.07 2024-08-11 10:26:02,896 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3550, loss[loss=0.09835, beats_loss=0.01311, ecapa_loss=0.000166, whisper_loss=0.08358, over 19081.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01137, ecapa_loss=0.0002016, whisper_loss=0.09289, over 3886596.15 frames. ], batch size: 73, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:26:07,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.679e+01 2.987e+01 3.672e+01 5.992e+01, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 10:26:07,425 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 10:26:08,732 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 10:26:23,785 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 10:26:25,283 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 10:26:26,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1050070.0, ans=0.125 2024-08-11 10:26:34,281 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.287e+02 2024-08-11 10:26:46,288 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 10:26:57,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2024-08-11 10:26:59,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1050370.0, ans=0.0 2024-08-11 10:26:59,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1050370.0, ans=0.125 2024-08-11 10:27:00,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-11 10:27:03,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050370.0, ans=0.1 2024-08-11 10:27:11,551 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 10:27:12,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3600, loss[loss=0.1006, beats_loss=0.0132, ecapa_loss=0.0001881, whisper_loss=0.08551, over 19049.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01134, ecapa_loss=0.0002021, whisper_loss=0.09286, over 3852668.29 frames. ], batch size: 78, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:27:16,780 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 10:27:20,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-11 10:27:30,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1050570.0, ans=0.125 2024-08-11 10:27:38,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1050670.0, ans=0.0 2024-08-11 10:27:57,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1050770.0, ans=0.1 2024-08-11 10:28:18,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1050870.0, ans=0.0 2024-08-11 10:28:19,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1050870.0, ans=0.1 2024-08-11 10:28:20,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-08-11 10:28:22,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3650, loss[loss=0.08434, beats_loss=0.01434, ecapa_loss=0.0002074, whisper_loss=0.06793, over 17755.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01141, ecapa_loss=0.0002024, whisper_loss=0.09277, over 3851826.88 frames. ], batch size: 76, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:28:23,770 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 10:28:25,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1050970.0, ans=0.2 2024-08-11 10:28:26,698 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.096e+01 2.681e+01 3.041e+01 3.404e+01 5.123e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 10:28:42,691 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 10:28:50,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1051170.0, ans=0.0 2024-08-11 10:28:54,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1051170.0, ans=0.125 2024-08-11 10:28:56,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1051170.0, ans=0.0 2024-08-11 10:29:01,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1051170.0, ans=0.2 2024-08-11 10:29:05,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1051270.0, ans=0.0 2024-08-11 10:29:15,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1051270.0, ans=0.125 2024-08-11 10:29:21,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1051370.0, ans=0.09899494936611666 2024-08-11 10:29:33,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3700, loss[loss=0.1125, beats_loss=0.009833, ecapa_loss=0.0002271, whisper_loss=0.1004, over 20043.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01131, ecapa_loss=0.0002033, whisper_loss=0.09313, over 3822185.50 frames. ], batch size: 83, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:29:42,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-08-11 10:29:42,598 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 10:29:58,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1051570.0, ans=0.0 2024-08-11 10:30:14,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1051670.0, ans=0.025 2024-08-11 10:30:14,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-08-11 10:30:40,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1051870.0, ans=0.125 2024-08-11 10:30:42,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1051870.0, ans=0.125 2024-08-11 10:30:45,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3750, loss[loss=0.1096, beats_loss=0.01263, ecapa_loss=0.0001722, whisper_loss=0.09529, over 22189.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01142, ecapa_loss=0.0002021, whisper_loss=0.09328, over 3866738.84 frames. ], batch size: 88, lr: 8.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:30:47,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1051970.0, ans=0.09899494936611666 2024-08-11 10:30:48,455 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-11 10:30:49,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.786e+01 3.057e+01 3.501e+01 5.299e+01, threshold=6.113e+01, percent-clipped=0.0 2024-08-11 10:31:06,192 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-11 10:31:18,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1052170.0, ans=0.0 2024-08-11 10:31:20,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1052170.0, ans=0.0 2024-08-11 10:31:25,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1052170.0, ans=0.125 2024-08-11 10:31:34,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1052270.0, ans=0.2 2024-08-11 10:31:50,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1052370.0, ans=0.125 2024-08-11 10:31:54,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-08-11 10:31:55,534 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3800, loss[loss=0.1016, beats_loss=0.009783, ecapa_loss=0.0002291, whisper_loss=0.08956, over 17794.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01136, ecapa_loss=0.0002034, whisper_loss=0.09368, over 3861143.47 frames. ], batch size: 71, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:32:18,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-11 10:32:19,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1052570.0, ans=0.125 2024-08-11 10:32:22,012 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 10:32:24,902 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 10:32:38,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1052770.0, ans=0.2 2024-08-11 10:32:56,773 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 10:33:00,058 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 10:33:06,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3850, loss[loss=0.09276, beats_loss=0.01281, ecapa_loss=0.0002377, whisper_loss=0.07757, over 19727.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01144, ecapa_loss=0.0002019, whisper_loss=0.09342, over 3884858.93 frames. ], batch size: 84, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:33:10,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+01 2.764e+01 3.232e+01 3.837e+01 5.936e+01, threshold=6.465e+01, percent-clipped=0.0 2024-08-11 10:33:32,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.52 vs. limit=12.0 2024-08-11 10:33:36,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-11 10:33:47,007 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 10:33:57,888 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 10:34:01,976 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 10:34:16,361 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3900, loss[loss=0.1247, beats_loss=0.009463, ecapa_loss=0.0002244, whisper_loss=0.113, over 14060.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.0114, ecapa_loss=0.0002023, whisper_loss=0.09426, over 3913179.77 frames. ], batch size: 56, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:34:29,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1053570.0, ans=0.0 2024-08-11 10:34:55,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.48 vs. limit=15.0 2024-08-11 10:35:14,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1053870.0, ans=0.125 2024-08-11 10:35:27,684 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 3950, loss[loss=0.1201, beats_loss=0.0115, ecapa_loss=0.0002281, whisper_loss=0.1063, over 21501.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01138, ecapa_loss=0.0002023, whisper_loss=0.09469, over 3901581.04 frames. ], batch size: 89, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:35:31,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1053970.0, ans=0.1 2024-08-11 10:35:32,094 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.817e+01 3.170e+01 3.825e+01 1.516e+02, threshold=6.340e+01, percent-clipped=2.0 2024-08-11 10:35:32,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1053970.0, ans=0.125 2024-08-11 10:35:35,044 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 19 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 10:36:08,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1054170.0, ans=0.1 2024-08-11 10:36:21,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1054270.0, ans=15.0 2024-08-11 10:36:23,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2024-08-11 10:36:24,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1054270.0, ans=0.1 2024-08-11 10:36:33,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1054370.0, ans=0.0 2024-08-11 10:36:34,598 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 10:36:40,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1054370.0, ans=0.0 2024-08-11 10:36:42,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4000, loss[loss=0.0989, beats_loss=0.013, ecapa_loss=0.0001676, whisper_loss=0.08423, over 16382.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01141, ecapa_loss=0.0002028, whisper_loss=0.09379, over 3886780.78 frames. ], batch size: 62, lr: 8.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:36:45,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=8.0 2024-08-11 10:37:04,476 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 10:37:12,953 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-11 10:37:31,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1054770.0, ans=0.2 2024-08-11 10:37:32,802 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 10:37:48,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1054870.0, ans=0.0 2024-08-11 10:37:49,706 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 10:37:58,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4050, loss[loss=0.1048, beats_loss=0.01088, ecapa_loss=0.0002202, whisper_loss=0.09172, over 20962.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01131, ecapa_loss=0.0002021, whisper_loss=0.09474, over 3894570.19 frames. ], batch size: 83, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:38:03,810 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.646e+01 2.921e+01 3.336e+01 5.282e+01, threshold=5.841e+01, percent-clipped=0.0 2024-08-11 10:38:07,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-11 10:38:17,481 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 27 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-11 10:38:49,562 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 10:38:59,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1055370.0, ans=0.0 2024-08-11 10:39:06,119 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 10:39:15,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4100, loss[loss=0.08919, beats_loss=0.01451, ecapa_loss=0.0001689, whisper_loss=0.07299, over 19953.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01133, ecapa_loss=0.0002017, whisper_loss=0.09425, over 3865900.51 frames. ], batch size: 80, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:39:15,327 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 10:39:20,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1055470.0, ans=0.125 2024-08-11 10:39:29,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1055570.0, ans=0.125 2024-08-11 10:39:29,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1055570.0, ans=0.2 2024-08-11 10:39:29,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-11 10:40:02,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1055770.0, ans=0.0 2024-08-11 10:40:17,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1055870.0, ans=0.125 2024-08-11 10:40:25,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055870.0, ans=0.1 2024-08-11 10:40:34,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4150, loss[loss=0.09002, beats_loss=0.0158, ecapa_loss=0.0001772, whisper_loss=0.07245, over 14663.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01134, ecapa_loss=0.0002019, whisper_loss=0.09411, over 3895510.99 frames. ], batch size: 60, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:40:34,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1055970.0, ans=0.0 2024-08-11 10:40:37,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1055970.0, ans=0.125 2024-08-11 10:40:38,419 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.691e+01 3.023e+01 3.383e+01 1.135e+02, threshold=6.046e+01, percent-clipped=2.0 2024-08-11 10:40:53,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1056070.0, ans=0.2 2024-08-11 10:40:59,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-11 10:41:00,511 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 10:41:03,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1056170.0, ans=0.125 2024-08-11 10:41:08,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1056170.0, ans=0.125 2024-08-11 10:41:24,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1056270.0, ans=0.0 2024-08-11 10:41:46,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1056470.0, ans=0.125 2024-08-11 10:41:48,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4200, loss[loss=0.1163, beats_loss=0.01092, ecapa_loss=0.0002293, whisper_loss=0.1031, over 14633.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01131, ecapa_loss=0.0002014, whisper_loss=0.09428, over 3872104.83 frames. ], batch size: 59, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:42:00,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1056470.0, ans=0.125 2024-08-11 10:42:07,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1056570.0, ans=0.1 2024-08-11 10:42:07,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1056570.0, ans=0.0 2024-08-11 10:42:18,090 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 10:42:32,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-11 10:42:35,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1056770.0, ans=0.125 2024-08-11 10:42:46,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1056770.0, ans=0.1 2024-08-11 10:42:48,645 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 10:42:58,342 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-11 10:42:58,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1056870.0, ans=0.0 2024-08-11 10:43:02,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4250, loss[loss=0.1094, beats_loss=0.01194, ecapa_loss=0.0001685, whisper_loss=0.09581, over 14621.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01136, ecapa_loss=0.0001995, whisper_loss=0.09342, over 3872929.11 frames. ], batch size: 57, lr: 8.01e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:43:07,331 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.666e+01 2.925e+01 3.281e+01 5.407e+01, threshold=5.850e+01, percent-clipped=0.0 2024-08-11 10:43:33,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1057170.0, ans=0.1 2024-08-11 10:43:42,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1057170.0, ans=0.1 2024-08-11 10:43:58,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1057270.0, ans=0.0 2024-08-11 10:44:14,832 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 10:44:15,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1057470.0, ans=0.125 2024-08-11 10:44:16,376 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4300, loss[loss=0.1135, beats_loss=0.01199, ecapa_loss=0.0002148, whisper_loss=0.09934, over 21391.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01139, ecapa_loss=0.0001988, whisper_loss=0.09295, over 3866875.05 frames. ], batch size: 86, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:44:35,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1057570.0, ans=0.1 2024-08-11 10:45:15,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1057770.0, ans=0.0 2024-08-11 10:45:17,990 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 10:45:26,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1057870.0, ans=0.125 2024-08-11 10:45:34,052 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4350, loss[loss=0.123, beats_loss=0.01018, ecapa_loss=0.0002346, whisper_loss=0.1105, over 17130.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01132, ecapa_loss=0.0002003, whisper_loss=0.09289, over 3840904.79 frames. ], batch size: 69, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:45:38,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.592e+01 2.860e+01 3.306e+01 4.790e+01, threshold=5.719e+01, percent-clipped=0.0 2024-08-11 10:45:38,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1057970.0, ans=10.0 2024-08-11 10:45:40,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1057970.0, ans=0.125 2024-08-11 10:45:44,706 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 10:45:49,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.17 vs. limit=22.5 2024-08-11 10:45:54,589 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 10:45:57,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-08-11 10:46:05,696 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 10:46:27,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1058270.0, ans=0.05 2024-08-11 10:46:32,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1058270.0, ans=0.125 2024-08-11 10:46:34,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1058370.0, ans=0.125 2024-08-11 10:46:34,074 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.005e-01 2024-08-11 10:46:42,602 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 10:46:46,228 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.370e+02 2024-08-11 10:46:51,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4400, loss[loss=0.1101, beats_loss=0.008701, ecapa_loss=0.0001705, whisper_loss=0.09973, over 18722.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01117, ecapa_loss=0.0002023, whisper_loss=0.09377, over 3860591.87 frames. ], batch size: 71, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:46:56,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1058470.0, ans=0.0 2024-08-11 10:47:01,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-11 10:47:23,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1058670.0, ans=0.125 2024-08-11 10:47:29,607 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 10:47:34,882 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 10:47:35,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1058670.0, ans=0.125 2024-08-11 10:47:41,479 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-11 10:47:44,536 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 10:47:53,404 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 10:48:02,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-08-11 10:48:10,481 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 10:48:13,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4450, loss[loss=0.09631, beats_loss=0.01359, ecapa_loss=0.0002049, whisper_loss=0.08067, over 20554.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01125, ecapa_loss=0.000202, whisper_loss=0.09348, over 3890922.51 frames. ], batch size: 87, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:48:13,903 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.194e-03 2024-08-11 10:48:17,636 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.805e+01 3.007e+01 3.333e+01 6.979e+01, threshold=6.014e+01, percent-clipped=1.0 2024-08-11 10:48:18,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1058970.0, ans=0.2 2024-08-11 10:48:18,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2024-08-11 10:48:38,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059070.0, ans=0.1 2024-08-11 10:48:47,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1059170.0, ans=0.2 2024-08-11 10:48:54,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1059170.0, ans=0.07 2024-08-11 10:48:59,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1059270.0, ans=0.0 2024-08-11 10:48:59,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-11 10:49:15,492 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 10:49:18,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2024-08-11 10:49:24,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1059370.0, ans=0.0 2024-08-11 10:49:25,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1059370.0, ans=0.125 2024-08-11 10:49:27,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-08-11 10:49:29,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4500, loss[loss=0.1111, beats_loss=0.01024, ecapa_loss=0.0001833, whisper_loss=0.09904, over 23136.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01131, ecapa_loss=0.0002011, whisper_loss=0.09306, over 3899071.71 frames. ], batch size: 88, lr: 8.00e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:49:56,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059570.0, ans=0.1 2024-08-11 10:49:59,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-08-11 10:50:05,385 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 10:50:11,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059670.0, ans=0.1 2024-08-11 10:50:16,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-11 10:50:44,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4550, loss[loss=0.1033, beats_loss=0.01318, ecapa_loss=0.0001927, whisper_loss=0.08822, over 22049.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01133, ecapa_loss=0.0002008, whisper_loss=0.09314, over 3904088.31 frames. ], batch size: 90, lr: 7.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-11 10:50:48,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.557e+01 2.865e+01 3.375e+01 6.211e+01, threshold=5.730e+01, percent-clipped=1.0 2024-08-11 10:50:55,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1059970.0, ans=0.04949747468305833 2024-08-11 10:51:04,872 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 10:51:19,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1060170.0, ans=0.125 2024-08-11 10:51:19,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1060170.0, ans=0.07 2024-08-11 10:51:20,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=12.0 2024-08-11 10:51:22,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-11 10:51:28,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1060170.0, ans=0.0 2024-08-11 10:51:31,089 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.348e-02 2024-08-11 10:51:37,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-11 10:51:45,843 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 10:51:58,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1060470.0, ans=0.125 2024-08-11 10:52:00,238 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4600, loss[loss=0.08473, beats_loss=0.01395, ecapa_loss=0.0001761, whisper_loss=0.06901, over 16845.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0002008, whisper_loss=0.09311, over 3892134.57 frames. ], batch size: 67, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:52:04,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-11 10:52:07,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060470.0, ans=0.1 2024-08-11 10:52:37,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1060670.0, ans=0.125 2024-08-11 10:53:00,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1060770.0, ans=0.2 2024-08-11 10:53:07,199 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 30 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 10:53:12,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1060870.0, ans=0.1 2024-08-11 10:53:15,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.899e+02 2024-08-11 10:53:21,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4650, loss[loss=0.09582, beats_loss=0.01212, ecapa_loss=0.0001754, whisper_loss=0.08194, over 17953.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01137, ecapa_loss=0.0002004, whisper_loss=0.09296, over 3908844.98 frames. ], batch size: 73, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:53:26,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.723e+01 3.113e+01 3.495e+01 7.663e+01, threshold=6.226e+01, percent-clipped=1.0 2024-08-11 10:53:42,411 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 35 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-11 10:53:49,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-08-11 10:53:53,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1061170.0, ans=0.125 2024-08-11 10:54:08,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1061270.0, ans=0.0 2024-08-11 10:54:20,150 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-11 10:54:30,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2024-08-11 10:54:35,121 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 10:54:35,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1061370.0, ans=0.2 2024-08-11 10:54:42,652 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4700, loss[loss=0.09548, beats_loss=0.01372, ecapa_loss=0.0002014, whisper_loss=0.07975, over 21986.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01136, ecapa_loss=0.0001991, whisper_loss=0.09342, over 3904077.78 frames. ], batch size: 93, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:54:46,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1061470.0, ans=0.125 2024-08-11 10:55:03,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-08-11 10:55:09,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1061570.0, ans=0.125 2024-08-11 10:55:20,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1061670.0, ans=0.0 2024-08-11 10:55:21,846 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 10:55:22,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.73 vs. limit=10.0 2024-08-11 10:55:26,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1061670.0, ans=0.125 2024-08-11 10:56:04,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1061970.0, ans=0.125 2024-08-11 10:56:05,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4750, loss[loss=0.08604, beats_loss=0.01087, ecapa_loss=0.0002223, whisper_loss=0.07294, over 13555.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01139, ecapa_loss=0.0002009, whisper_loss=0.09304, over 3891078.07 frames. ], batch size: 54, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:56:07,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1061970.0, ans=0.125 2024-08-11 10:56:10,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+01 2.759e+01 3.104e+01 3.569e+01 5.241e+01, threshold=6.207e+01, percent-clipped=0.0 2024-08-11 10:56:23,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1062070.0, ans=0.2 2024-08-11 10:56:51,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1062170.0, ans=0.1 2024-08-11 10:57:25,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1062370.0, ans=0.125 2024-08-11 10:57:31,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4800, loss[loss=0.08473, beats_loss=0.01129, ecapa_loss=0.0002533, whisper_loss=0.0709, over 20208.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01146, ecapa_loss=0.0001998, whisper_loss=0.09229, over 3865997.14 frames. ], batch size: 91, lr: 7.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:57:49,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1062570.0, ans=0.05 2024-08-11 10:57:51,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-11 10:57:58,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1062570.0, ans=0.2 2024-08-11 10:57:58,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-08-11 10:58:17,778 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 10:58:45,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.10 vs. limit=10.0 2024-08-11 10:58:48,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1062870.0, ans=0.125 2024-08-11 10:58:55,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4850, loss[loss=0.1131, beats_loss=0.0102, ecapa_loss=0.0001962, whisper_loss=0.1009, over 19897.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01146, ecapa_loss=0.0002002, whisper_loss=0.09291, over 3911971.83 frames. ], batch size: 78, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 10:59:00,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.634e+01 3.190e+01 3.671e+01 5.547e+01, threshold=6.379e+01, percent-clipped=0.0 2024-08-11 10:59:29,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1063170.0, ans=0.125 2024-08-11 10:59:34,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1063170.0, ans=0.125 2024-08-11 10:59:40,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=12.0 2024-08-11 10:59:54,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1063270.0, ans=0.125 2024-08-11 11:00:09,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1063370.0, ans=0.2 2024-08-11 11:00:11,753 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 11:00:15,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4900, loss[loss=0.09712, beats_loss=0.0126, ecapa_loss=0.0002197, whisper_loss=0.08233, over 21661.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01143, ecapa_loss=0.0002001, whisper_loss=0.09362, over 3916912.49 frames. ], batch size: 91, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:00:52,772 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 11:01:30,834 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 11:01:37,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 4950, loss[loss=0.1059, beats_loss=0.01389, ecapa_loss=0.000171, whisper_loss=0.09027, over 20691.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01136, ecapa_loss=0.0001992, whisper_loss=0.09394, over 3873809.63 frames. ], batch size: 81, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:01:43,721 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.682e+01 3.010e+01 3.354e+01 5.437e+01, threshold=6.020e+01, percent-clipped=0.0 2024-08-11 11:01:51,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1063970.0, ans=0.0 2024-08-11 11:01:53,837 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 11:01:54,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1064070.0, ans=0.1 2024-08-11 11:02:01,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1064070.0, ans=0.1 2024-08-11 11:02:05,978 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 11:02:30,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1064270.0, ans=0.2 2024-08-11 11:02:55,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1064370.0, ans=0.07 2024-08-11 11:03:00,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5000, loss[loss=0.07605, beats_loss=0.01067, ecapa_loss=0.0002752, whisper_loss=0.06263, over 12480.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01128, ecapa_loss=0.0002003, whisper_loss=0.0944, over 3863040.53 frames. ], batch size: 55, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:03:18,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-08-11 11:03:53,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=1064770.0, ans=0.02 2024-08-11 11:04:02,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-08-11 11:04:15,020 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 11:04:24,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5050, loss[loss=0.1213, beats_loss=0.009492, ecapa_loss=0.0002168, whisper_loss=0.1096, over 19946.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01133, ecapa_loss=0.0002002, whisper_loss=0.09399, over 3875328.84 frames. ], batch size: 79, lr: 7.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:04:27,235 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 11:04:30,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 2.899e+01 3.463e+01 4.526e+01, threshold=5.797e+01, percent-clipped=0.0 2024-08-11 11:04:31,637 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 11:04:46,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1065070.0, ans=0.125 2024-08-11 11:04:46,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1065070.0, ans=0.125 2024-08-11 11:05:12,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=12.0 2024-08-11 11:05:15,480 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 11:05:36,846 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 11:05:46,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1065370.0, ans=0.1 2024-08-11 11:05:47,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-08-11 11:05:52,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1065470.0, ans=0.125 2024-08-11 11:05:53,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5100, loss[loss=0.1158, beats_loss=0.009836, ecapa_loss=0.0001958, whisper_loss=0.104, over 20828.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0113, ecapa_loss=0.000199, whisper_loss=0.09408, over 3902606.43 frames. ], batch size: 81, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:06:03,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.17 vs. limit=6.0 2024-08-11 11:06:08,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1065470.0, ans=0.1 2024-08-11 11:06:29,372 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 11:06:41,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1065670.0, ans=0.125 2024-08-11 11:06:54,094 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 11:06:54,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1065770.0, ans=0.07 2024-08-11 11:07:01,886 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 11:07:04,348 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 11:07:07,891 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 11:07:16,613 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5150, loss[loss=0.1177, beats_loss=0.009231, ecapa_loss=0.0002161, whisper_loss=0.1063, over 17545.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01127, ecapa_loss=0.0002005, whisper_loss=0.09462, over 3918184.74 frames. ], batch size: 70, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:07:17,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1065970.0, ans=0.1 2024-08-11 11:07:22,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.743e+01 3.078e+01 3.597e+01 5.105e+01, threshold=6.156e+01, percent-clipped=0.0 2024-08-11 11:07:24,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1065970.0, ans=0.125 2024-08-11 11:07:28,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1065970.0, ans=0.0 2024-08-11 11:07:42,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1066070.0, ans=0.125 2024-08-11 11:07:46,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=15.0 2024-08-11 11:07:49,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1066170.0, ans=0.1 2024-08-11 11:07:52,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1066170.0, ans=0.125 2024-08-11 11:07:52,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1066170.0, ans=0.125 2024-08-11 11:07:58,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1066170.0, ans=0.1 2024-08-11 11:08:04,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1066270.0, ans=0.125 2024-08-11 11:08:06,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1066270.0, ans=0.04949747468305833 2024-08-11 11:08:21,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1066370.0, ans=15.0 2024-08-11 11:08:30,539 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 11:08:32,205 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 11:08:32,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1066470.0, ans=0.125 2024-08-11 11:08:33,326 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5200, loss[loss=0.08988, beats_loss=0.01163, ecapa_loss=0.000214, whisper_loss=0.07611, over 15079.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01124, ecapa_loss=0.0001992, whisper_loss=0.09405, over 3880985.94 frames. ], batch size: 62, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:08:43,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1066470.0, ans=0.125 2024-08-11 11:08:58,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1066570.0, ans=0.05 2024-08-11 11:09:02,517 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 11:09:12,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1066670.0, ans=0.125 2024-08-11 11:09:24,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1066770.0, ans=0.0 2024-08-11 11:09:52,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5250, loss[loss=0.07828, beats_loss=0.01515, ecapa_loss=0.000154, whisper_loss=0.06159, over 23420.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01128, ecapa_loss=0.0002003, whisper_loss=0.09294, over 3882791.98 frames. ], batch size: 93, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:09:57,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.555e+01 2.975e+01 3.407e+01 4.666e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 11:10:08,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1067070.0, ans=0.0 2024-08-11 11:10:15,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1067070.0, ans=0.125 2024-08-11 11:10:18,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1067070.0, ans=0.025 2024-08-11 11:10:32,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1067170.0, ans=0.125 2024-08-11 11:11:10,735 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5300, loss[loss=0.1082, beats_loss=0.01316, ecapa_loss=0.0001707, whisper_loss=0.09332, over 20959.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01122, ecapa_loss=0.0002003, whisper_loss=0.09358, over 3904462.67 frames. ], batch size: 83, lr: 7.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:11:41,321 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.196e-01 2024-08-11 11:11:50,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1067670.0, ans=0.05 2024-08-11 11:11:54,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1067770.0, ans=0.1 2024-08-11 11:11:55,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1067770.0, ans=0.025 2024-08-11 11:12:00,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1067770.0, ans=0.125 2024-08-11 11:12:09,624 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 11:12:29,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5350, loss[loss=0.09961, beats_loss=0.0134, ecapa_loss=0.0001707, whisper_loss=0.0845, over 22026.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01122, ecapa_loss=0.0001991, whisper_loss=0.09302, over 3883379.49 frames. ], batch size: 89, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:12:36,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.785e+01 3.077e+01 3.493e+01 6.327e+01, threshold=6.155e+01, percent-clipped=1.0 2024-08-11 11:13:09,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1068070.0, ans=0.2 2024-08-11 11:13:33,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1068270.0, ans=0.0 2024-08-11 11:13:36,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1068270.0, ans=0.0 2024-08-11 11:13:57,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1068370.0, ans=0.09899494936611666 2024-08-11 11:14:08,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1068370.0, ans=0.0 2024-08-11 11:14:14,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1068470.0, ans=0.0 2024-08-11 11:14:15,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5400, loss[loss=0.1092, beats_loss=0.01076, ecapa_loss=0.0002398, whisper_loss=0.09605, over 14596.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.0001978, whisper_loss=0.09258, over 3874755.24 frames. ], batch size: 61, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:14:15,542 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 11:14:20,085 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-11 11:14:28,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:14:35,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1068570.0, ans=0.0 2024-08-11 11:14:42,295 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-11 11:14:45,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1068570.0, ans=0.125 2024-08-11 11:15:12,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1068770.0, ans=0.125 2024-08-11 11:15:24,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1068770.0, ans=0.2 2024-08-11 11:15:30,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1068870.0, ans=0.0 2024-08-11 11:15:36,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1068870.0, ans=0.1 2024-08-11 11:15:43,044 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 11:15:50,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5450, loss[loss=0.08563, beats_loss=0.01274, ecapa_loss=0.0002131, whisper_loss=0.07076, over 18866.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0113, ecapa_loss=0.0001994, whisper_loss=0.09291, over 3887166.90 frames. ], batch size: 78, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:15:51,073 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 11:15:57,398 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.869e+01 3.117e+01 3.592e+01 6.207e+01, threshold=6.234e+01, percent-clipped=1.0 2024-08-11 11:15:57,546 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 12 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 11:16:02,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-08-11 11:16:09,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-08-11 11:16:14,401 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 11:17:02,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1069270.0, ans=0.1 2024-08-11 11:17:21,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1069370.0, ans=0.0 2024-08-11 11:17:26,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1069370.0, ans=0.1 2024-08-11 11:17:35,743 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5500, loss[loss=0.1037, beats_loss=0.01339, ecapa_loss=0.0001688, whisper_loss=0.08863, over 21848.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01142, ecapa_loss=0.0001981, whisper_loss=0.09253, over 3893449.51 frames. ], batch size: 87, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:17:53,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1069470.0, ans=0.125 2024-08-11 11:18:08,750 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 11:18:26,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1069670.0, ans=0.1 2024-08-11 11:18:42,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1069770.0, ans=0.125 2024-08-11 11:19:22,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5550, loss[loss=0.09597, beats_loss=0.01262, ecapa_loss=0.000164, whisper_loss=0.08171, over 15276.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01145, ecapa_loss=0.0001976, whisper_loss=0.09205, over 3872660.06 frames. ], batch size: 59, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:19:23,891 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-11 11:19:28,685 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.609e+01 2.954e+01 3.474e+01 6.484e+01, threshold=5.909e+01, percent-clipped=2.0 2024-08-11 11:19:29,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1069970.0, ans=0.125 2024-08-11 11:19:31,824 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 11:20:50,073 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 11:20:50,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1070370.0, ans=0.125 2024-08-11 11:20:51,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1070370.0, ans=0.0 2024-08-11 11:20:54,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5600, loss[loss=0.1394, beats_loss=0.008507, ecapa_loss=0.0001967, whisper_loss=0.1289, over 21416.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01137, ecapa_loss=0.0001992, whisper_loss=0.09245, over 3894335.37 frames. ], batch size: 84, lr: 7.96e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:21:08,239 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 11:21:14,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-08-11 11:21:20,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1070670.0, ans=0.0 2024-08-11 11:21:26,734 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 11:21:34,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.17 vs. limit=6.0 2024-08-11 11:21:42,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1070770.0, ans=0.125 2024-08-11 11:21:54,259 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-11 11:21:57,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1070870.0, ans=0.1 2024-08-11 11:22:07,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5650, loss[loss=0.108, beats_loss=0.01214, ecapa_loss=0.0002352, whisper_loss=0.09353, over 21502.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01133, ecapa_loss=0.0001999, whisper_loss=0.09246, over 3892433.48 frames. ], batch size: 90, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:22:07,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1070970.0, ans=0.2 2024-08-11 11:22:11,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.577e+01 2.929e+01 3.455e+01 8.964e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 11:22:14,741 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 11:22:33,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071070.0, ans=0.1 2024-08-11 11:22:36,212 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 11:22:38,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1071170.0, ans=0.125 2024-08-11 11:22:44,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1071170.0, ans=10.0 2024-08-11 11:22:50,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-08-11 11:22:55,387 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 11:22:58,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=12.0 2024-08-11 11:23:05,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-08-11 11:23:09,480 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 11:23:09,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1071370.0, ans=0.0 2024-08-11 11:23:16,150 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 11:23:21,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1071370.0, ans=0.125 2024-08-11 11:23:25,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5700, loss[loss=0.07142, beats_loss=0.01565, ecapa_loss=0.0001749, whisper_loss=0.05403, over 17858.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01135, ecapa_loss=0.0001997, whisper_loss=0.09266, over 3925284.94 frames. ], batch size: 75, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:23:50,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1071570.0, ans=0.125 2024-08-11 11:23:58,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071670.0, ans=0.1 2024-08-11 11:24:02,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.63 vs. limit=10.0 2024-08-11 11:24:09,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1071770.0, ans=0.125 2024-08-11 11:24:11,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1071770.0, ans=0.0 2024-08-11 11:24:13,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1071770.0, ans=0.0 2024-08-11 11:24:16,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2024-08-11 11:24:18,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1071770.0, ans=0.0 2024-08-11 11:24:43,807 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5750, loss[loss=0.09059, beats_loss=0.01364, ecapa_loss=0.0001293, whisper_loss=0.07565, over 23325.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01142, ecapa_loss=0.0001994, whisper_loss=0.09241, over 3912230.59 frames. ], batch size: 90, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:24:48,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=22.5 2024-08-11 11:24:48,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.718e+01 3.107e+01 3.541e+01 5.804e+01, threshold=6.214e+01, percent-clipped=0.0 2024-08-11 11:24:50,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1071970.0, ans=0.125 2024-08-11 11:24:57,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1072070.0, ans=0.125 2024-08-11 11:24:58,998 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 11:25:14,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1072170.0, ans=0.0 2024-08-11 11:25:27,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1072170.0, ans=0.0 2024-08-11 11:25:28,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1072170.0, ans=0.125 2024-08-11 11:25:34,642 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 11:25:35,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1072270.0, ans=0.125 2024-08-11 11:25:37,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1072270.0, ans=0.125 2024-08-11 11:25:39,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-08-11 11:25:43,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1072270.0, ans=0.0 2024-08-11 11:25:48,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=12.0 2024-08-11 11:26:02,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-08-11 11:26:02,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5800, loss[loss=0.1164, beats_loss=0.01064, ecapa_loss=0.0001916, whisper_loss=0.1038, over 15671.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001996, whisper_loss=0.09315, over 3882329.99 frames. ], batch size: 61, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:26:04,267 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 11:26:06,124 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 11:26:07,806 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-11 11:26:31,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1072670.0, ans=0.1 2024-08-11 11:26:33,215 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 11:26:49,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1072770.0, ans=0.2 2024-08-11 11:27:18,623 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5850, loss[loss=0.1175, beats_loss=0.01104, ecapa_loss=0.0001742, whisper_loss=0.1047, over 15501.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01142, ecapa_loss=0.0001992, whisper_loss=0.09298, over 3907485.29 frames. ], batch size: 59, lr: 7.95e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:27:20,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1072970.0, ans=0.125 2024-08-11 11:27:21,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2024-08-11 11:27:23,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.774e+01 3.139e+01 3.627e+01 6.860e+01, threshold=6.277e+01, percent-clipped=1.0 2024-08-11 11:27:31,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1072970.0, ans=0.0 2024-08-11 11:27:32,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1073070.0, ans=0.0 2024-08-11 11:27:33,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1073070.0, ans=0.0 2024-08-11 11:27:38,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1073070.0, ans=0.1 2024-08-11 11:27:49,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1073170.0, ans=0.2 2024-08-11 11:28:31,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5900, loss[loss=0.1259, beats_loss=0.007964, ecapa_loss=0.000196, whisper_loss=0.1159, over 21735.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01127, ecapa_loss=0.0001997, whisper_loss=0.09389, over 3894732.20 frames. ], batch size: 83, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:29:03,424 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 11:29:16,038 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 9 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 11:29:24,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1073770.0, ans=0.125 2024-08-11 11:29:34,042 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 11:29:42,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 5950, loss[loss=0.09429, beats_loss=0.01247, ecapa_loss=0.0001883, whisper_loss=0.07994, over 22715.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01138, ecapa_loss=0.0001988, whisper_loss=0.09263, over 3879490.49 frames. ], batch size: 93, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:29:47,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.700e+01 3.029e+01 3.647e+01 6.302e+01, threshold=6.057e+01, percent-clipped=1.0 2024-08-11 11:29:48,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1073970.0, ans=0.1 2024-08-11 11:29:48,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=12.0 2024-08-11 11:29:58,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1074070.0, ans=0.125 2024-08-11 11:30:17,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1074170.0, ans=0.125 2024-08-11 11:30:41,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-11 11:30:42,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1074370.0, ans=0.125 2024-08-11 11:30:46,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1074370.0, ans=0.0 2024-08-11 11:30:47,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-08-11 11:30:50,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1074370.0, ans=0.0 2024-08-11 11:30:53,423 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 11:30:54,771 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 11:30:56,105 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6000, loss[loss=0.1264, beats_loss=0.01054, ecapa_loss=0.0001932, whisper_loss=0.1139, over 22146.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01143, ecapa_loss=0.000199, whisper_loss=0.09273, over 3904420.97 frames. ], batch size: 86, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:30:56,106 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 11:31:34,846 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on ASR_libri: loss=0.2586, beats_loss=0, ecapa_loss=0.0006404, whisper_loss=0.2522, over 922467.00 frames. 2024-08-11 11:31:52,415 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on SV_voxceleb1: loss=0.005252, beats_loss=0, ecapa_loss=0.0005252, whisper_loss=0, over 939242.00 frames. 2024-08-11 11:32:29,241 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4088, 4.6127, 5.2130, 5.2948], device='cuda:0') 2024-08-11 11:33:45,228 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on AT_audioset: loss=0.02539, beats_loss=0.02539, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 11:33:45,232 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 11:33:45,489 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 11:34:34,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1074770.0, ans=0.0 2024-08-11 11:34:46,610 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 11:34:58,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6050, loss[loss=0.1201, beats_loss=0.01105, ecapa_loss=0.0001665, whisper_loss=0.1074, over 21419.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01141, ecapa_loss=0.0001987, whisper_loss=0.09258, over 3881405.14 frames. ], batch size: 83, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:35:02,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1074970.0, ans=0.125 2024-08-11 11:35:03,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+01 2.749e+01 3.055e+01 3.427e+01 5.083e+01, threshold=6.111e+01, percent-clipped=0.0 2024-08-11 11:35:06,724 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 11:35:07,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2024-08-11 11:35:28,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1075170.0, ans=0.1 2024-08-11 11:35:29,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1075170.0, ans=0.125 2024-08-11 11:35:54,578 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 11:36:01,963 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 11:36:02,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1075370.0, ans=0.125 2024-08-11 11:36:14,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6100, loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001753, whisper_loss=0.09337, over 14945.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01143, ecapa_loss=0.0001977, whisper_loss=0.09215, over 3900663.09 frames. ], batch size: 55, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:36:29,669 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 11:36:39,920 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 11:36:48,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1075670.0, ans=6.0 2024-08-11 11:37:04,611 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 11:37:14,603 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-11 11:37:22,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1075870.0, ans=0.125 2024-08-11 11:37:25,606 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 11 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 11:37:28,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2024-08-11 11:37:30,020 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6150, loss[loss=0.1034, beats_loss=0.009169, ecapa_loss=0.0002437, whisper_loss=0.09177, over 18035.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01137, ecapa_loss=0.0001999, whisper_loss=0.09245, over 3901951.17 frames. ], batch size: 73, lr: 7.94e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:37:34,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.684e+01 3.005e+01 3.339e+01 4.754e+01, threshold=6.009e+01, percent-clipped=0.0 2024-08-11 11:38:08,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-08-11 11:38:27,277 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 11:38:31,768 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 11:38:40,658 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 11:38:43,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6200, loss[loss=0.1262, beats_loss=0.01142, ecapa_loss=0.0001759, whisper_loss=0.113, over 23463.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0002002, whisper_loss=0.09278, over 3887213.31 frames. ], batch size: 91, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:38:46,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1076470.0, ans=0.05 2024-08-11 11:38:47,869 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 11:38:59,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1076570.0, ans=0.125 2024-08-11 11:38:59,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1076570.0, ans=0.07 2024-08-11 11:39:18,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1076670.0, ans=0.125 2024-08-11 11:39:30,386 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.776e+05 2024-08-11 11:39:38,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1076770.0, ans=0.1 2024-08-11 11:39:59,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6250, loss[loss=0.102, beats_loss=0.01264, ecapa_loss=0.000202, whisper_loss=0.08729, over 22797.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01135, ecapa_loss=0.0002, whisper_loss=0.09263, over 3886292.37 frames. ], batch size: 92, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:40:04,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+01 2.830e+01 2.972e+01 3.439e+01 5.876e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 11:40:07,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1076970.0, ans=0.2 2024-08-11 11:40:13,141 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 11:40:13,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1077070.0, ans=0.2 2024-08-11 11:40:13,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.27 vs. limit=10.0 2024-08-11 11:40:15,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1077070.0, ans=0.1 2024-08-11 11:40:16,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1077070.0, ans=15.0 2024-08-11 11:40:46,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1077270.0, ans=0.125 2024-08-11 11:40:49,289 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-11 11:40:53,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1077270.0, ans=0.0 2024-08-11 11:41:06,400 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 11:41:13,126 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6300, loss[loss=0.09551, beats_loss=0.01371, ecapa_loss=0.0001594, whisper_loss=0.0802, over 19232.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01135, ecapa_loss=0.0001998, whisper_loss=0.09244, over 3878748.77 frames. ], batch size: 76, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:41:37,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-11 11:41:39,478 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 11:41:47,467 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 11:41:53,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1077670.0, ans=0.0 2024-08-11 11:41:54,526 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.583e+05 2024-08-11 11:42:20,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1077870.0, ans=0.125 2024-08-11 11:42:24,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1077970.0, ans=0.05 2024-08-11 11:42:24,896 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6350, loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001839, whisper_loss=0.09094, over 13990.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01137, ecapa_loss=0.0002008, whisper_loss=0.09253, over 3877270.10 frames. ], batch size: 54, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:42:29,315 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.640e+01 2.866e+01 3.160e+01 1.102e+02, threshold=5.732e+01, percent-clipped=1.0 2024-08-11 11:43:22,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1078270.0, ans=0.125 2024-08-11 11:43:29,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1078370.0, ans=0.1 2024-08-11 11:43:34,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1078370.0, ans=0.0 2024-08-11 11:43:37,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2024-08-11 11:43:39,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6400, loss[loss=0.103, beats_loss=0.01025, ecapa_loss=0.0002238, whisper_loss=0.09054, over 21768.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01128, ecapa_loss=0.0001986, whisper_loss=0.09327, over 3899005.72 frames. ], batch size: 91, lr: 7.93e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:43:53,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1078570.0, ans=0.04949747468305833 2024-08-11 11:44:09,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1078670.0, ans=0.125 2024-08-11 11:44:32,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1078770.0, ans=0.0 2024-08-11 11:44:34,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1078770.0, ans=0.0 2024-08-11 11:44:37,903 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 11:44:53,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1078870.0, ans=0.125 2024-08-11 11:44:55,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6450, loss[loss=0.1024, beats_loss=0.01197, ecapa_loss=0.0002113, whisper_loss=0.08836, over 21678.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.0113, ecapa_loss=0.0001985, whisper_loss=0.09348, over 3903667.46 frames. ], batch size: 91, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:45:01,155 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+01 2.754e+01 3.078e+01 3.674e+01 5.893e+01, threshold=6.156e+01, percent-clipped=1.0 2024-08-11 11:45:01,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1078970.0, ans=0.015 2024-08-11 11:45:12,459 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 11:45:20,991 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 11:45:23,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-11 11:45:23,721 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 11:45:25,186 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 11:45:34,933 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 11:45:47,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1079270.0, ans=0.0 2024-08-11 11:45:51,384 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 11:45:53,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1079370.0, ans=0.125 2024-08-11 11:46:08,901 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6500, loss[loss=0.1025, beats_loss=0.01344, ecapa_loss=0.0002112, whisper_loss=0.08696, over 20884.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01125, ecapa_loss=0.0001989, whisper_loss=0.0942, over 3914674.55 frames. ], batch size: 87, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:46:13,396 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 11:46:14,829 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 11:46:18,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1079470.0, ans=0.07 2024-08-11 11:46:22,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1079570.0, ans=0.0 2024-08-11 11:46:25,601 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-11 11:46:33,428 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 11:46:33,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2024-08-11 11:46:39,971 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 11:46:44,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1079670.0, ans=0.125 2024-08-11 11:46:56,007 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 11:46:59,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1079770.0, ans=0.1 2024-08-11 11:47:16,254 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 11:47:16,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1079870.0, ans=0.0 2024-08-11 11:47:20,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6550, loss[loss=0.1159, beats_loss=0.0102, ecapa_loss=0.0002114, whisper_loss=0.1036, over 19154.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0001988, whisper_loss=0.09416, over 3928812.94 frames. ], batch size: 75, lr: 7.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-11 11:47:20,587 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 11:47:23,724 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-108000.pt 2024-08-11 11:47:27,882 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+01 2.781e+01 3.122e+01 3.450e+01 5.322e+01, threshold=6.243e+01, percent-clipped=0.0 2024-08-11 11:47:30,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1079970.0, ans=0.0 2024-08-11 11:47:38,424 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 11:47:42,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1080070.0, ans=0.1 2024-08-11 11:47:46,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.95 vs. limit=22.5 2024-08-11 11:47:48,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1080070.0, ans=0.0 2024-08-11 11:47:48,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-11 11:48:09,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=15.0 2024-08-11 11:48:15,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1080270.0, ans=0.0 2024-08-11 11:48:17,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1080270.0, ans=0.025 2024-08-11 11:48:19,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-11 11:48:20,013 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 11:48:24,912 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 11:48:30,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1080370.0, ans=0.0 2024-08-11 11:48:37,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6600, loss[loss=0.1241, beats_loss=0.01242, ecapa_loss=0.0002328, whisper_loss=0.1094, over 22643.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01128, ecapa_loss=0.0002007, whisper_loss=0.09419, over 3921660.73 frames. ], batch size: 95, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:48:37,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1080470.0, ans=0.125 2024-08-11 11:48:54,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1080570.0, ans=0.0 2024-08-11 11:48:55,358 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 11:48:59,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2024-08-11 11:49:10,906 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 11:49:19,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1080770.0, ans=0.125 2024-08-11 11:49:19,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1080770.0, ans=0.0 2024-08-11 11:49:22,044 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 11:49:37,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1080870.0, ans=0.1 2024-08-11 11:49:38,978 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 11:49:40,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1080870.0, ans=0.2 2024-08-11 11:49:41,574 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-11 11:49:47,001 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 11:49:50,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6650, loss[loss=0.1258, beats_loss=0.009017, ecapa_loss=0.00022, whisper_loss=0.1146, over 20703.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01137, ecapa_loss=0.0001991, whisper_loss=0.09368, over 3924927.43 frames. ], batch size: 81, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:49:53,450 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 11:49:53,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1080970.0, ans=0.125 2024-08-11 11:49:54,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.681e+01 2.981e+01 3.448e+01 5.241e+01, threshold=5.962e+01, percent-clipped=0.0 2024-08-11 11:49:56,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1080970.0, ans=0.125 2024-08-11 11:49:56,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1080970.0, ans=0.125 2024-08-11 11:50:00,319 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 11:50:04,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1081070.0, ans=0.125 2024-08-11 11:50:13,046 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 11:50:40,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2024-08-11 11:50:59,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1081370.0, ans=0.2 2024-08-11 11:51:01,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6700, loss[loss=0.08866, beats_loss=0.01261, ecapa_loss=0.0002146, whisper_loss=0.07391, over 20674.00 frames. ], tot_loss[loss=0.1078, beats_loss=0.01134, ecapa_loss=0.0002024, whisper_loss=0.09443, over 3901762.24 frames. ], batch size: 86, lr: 7.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:51:23,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1081570.0, ans=0.125 2024-08-11 11:51:25,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1081570.0, ans=0.2 2024-08-11 11:51:29,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1081570.0, ans=0.0 2024-08-11 11:52:09,297 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 11:52:09,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1081870.0, ans=0.2 2024-08-11 11:52:09,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1081870.0, ans=0.125 2024-08-11 11:52:14,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6750, loss[loss=0.07485, beats_loss=0.01504, ecapa_loss=0.0001552, whisper_loss=0.05826, over 16882.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01148, ecapa_loss=0.0002009, whisper_loss=0.09341, over 3890644.30 frames. ], batch size: 69, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:52:18,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.101e+01 2.942e+01 3.557e+01 4.197e+01 2.407e+02, threshold=7.114e+01, percent-clipped=7.0 2024-08-11 11:52:19,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1081970.0, ans=0.0 2024-08-11 11:52:19,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-11 11:52:23,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1081970.0, ans=0.125 2024-08-11 11:52:26,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-08-11 11:52:35,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1082070.0, ans=0.125 2024-08-11 11:52:42,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2024-08-11 11:52:43,755 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 11:52:51,516 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-11 11:52:51,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1082170.0, ans=0.1 2024-08-11 11:52:55,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=10.0 2024-08-11 11:53:15,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1082370.0, ans=0.1 2024-08-11 11:53:17,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1082370.0, ans=0.125 2024-08-11 11:53:20,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1082370.0, ans=0.1 2024-08-11 11:53:20,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1082370.0, ans=0.125 2024-08-11 11:53:22,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-08-11 11:53:23,176 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-11 11:53:27,058 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6800, loss[loss=0.1085, beats_loss=0.01189, ecapa_loss=0.0001965, whisper_loss=0.09463, over 16140.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01151, ecapa_loss=0.0001997, whisper_loss=0.09279, over 3888718.18 frames. ], batch size: 64, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:53:37,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1082470.0, ans=0.035 2024-08-11 11:53:52,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1082570.0, ans=0.125 2024-08-11 11:54:00,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-08-11 11:54:03,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1082670.0, ans=0.1 2024-08-11 11:54:03,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1082670.0, ans=0.0 2024-08-11 11:54:37,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1082870.0, ans=0.02 2024-08-11 11:54:39,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6850, loss[loss=0.095, beats_loss=0.01115, ecapa_loss=0.0002159, whisper_loss=0.0817, over 14960.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01143, ecapa_loss=0.0002008, whisper_loss=0.09273, over 3882987.08 frames. ], batch size: 61, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:54:43,219 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 11:54:44,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.694e+01 2.999e+01 3.363e+01 5.238e+01, threshold=5.998e+01, percent-clipped=0.0 2024-08-11 11:54:50,008 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 11:55:00,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1083070.0, ans=0.125 2024-08-11 11:55:08,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1083170.0, ans=0.125 2024-08-11 11:55:11,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1083170.0, ans=0.125 2024-08-11 11:55:18,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1083170.0, ans=0.125 2024-08-11 11:55:19,644 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.207e+01 2024-08-11 11:55:20,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1083170.0, ans=0.125 2024-08-11 11:55:21,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.84 vs. limit=10.0 2024-08-11 11:55:36,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1083370.0, ans=0.0 2024-08-11 11:55:49,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1083470.0, ans=0.125 2024-08-11 11:55:49,781 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6900, loss[loss=0.08762, beats_loss=0.01291, ecapa_loss=0.000164, whisper_loss=0.07307, over 19641.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01147, ecapa_loss=0.0002004, whisper_loss=0.09241, over 3889388.36 frames. ], batch size: 76, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:56:14,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-08-11 11:56:15,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2024-08-11 11:56:28,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-11 11:56:46,938 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 11:56:56,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1083970.0, ans=0.125 2024-08-11 11:56:57,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 6950, loss[loss=0.115, beats_loss=0.00945, ecapa_loss=0.0001925, whisper_loss=0.1036, over 22961.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01152, ecapa_loss=0.0001994, whisper_loss=0.09239, over 3888889.36 frames. ], batch size: 90, lr: 7.91e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:56:57,493 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 11:57:01,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.671e+01 2.938e+01 3.749e+01 5.482e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-11 11:57:03,258 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 11:57:06,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1083970.0, ans=0.2 2024-08-11 11:57:31,325 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 11:57:42,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1084270.0, ans=0.125 2024-08-11 11:57:49,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-08-11 11:57:52,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1084370.0, ans=0.0 2024-08-11 11:57:58,060 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 11:57:58,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1084370.0, ans=0.125 2024-08-11 11:57:59,336 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 11:58:02,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1084370.0, ans=0.125 2024-08-11 11:58:04,523 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7000, loss[loss=0.121, beats_loss=0.00993, ecapa_loss=0.0001591, whisper_loss=0.1095, over 15577.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01145, ecapa_loss=0.0001996, whisper_loss=0.09267, over 3862796.60 frames. ], batch size: 57, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:58:11,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-11 11:58:17,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1084570.0, ans=0.125 2024-08-11 11:58:21,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1084570.0, ans=0.125 2024-08-11 11:58:27,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1084570.0, ans=0.0 2024-08-11 11:58:28,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.96 vs. limit=22.5 2024-08-11 11:58:55,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=12.0 2024-08-11 11:59:11,983 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7050, loss[loss=0.1197, beats_loss=0.01288, ecapa_loss=0.0001848, whisper_loss=0.105, over 14140.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01146, ecapa_loss=0.0001995, whisper_loss=0.09305, over 3878084.54 frames. ], batch size: 54, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 11:59:13,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1084970.0, ans=0.1 2024-08-11 11:59:14,858 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 11:59:15,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.647e+01 2.921e+01 3.539e+01 5.654e+01, threshold=5.842e+01, percent-clipped=0.0 2024-08-11 11:59:26,991 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 11:59:37,311 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 11:59:37,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1085170.0, ans=0.035 2024-08-11 11:59:50,104 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.270e-03 2024-08-11 11:59:56,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1085270.0, ans=0.0 2024-08-11 11:59:56,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1085270.0, ans=0.2 2024-08-11 12:00:08,879 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 12:00:10,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1085370.0, ans=0.125 2024-08-11 12:00:11,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1085370.0, ans=0.2 2024-08-11 12:00:11,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2024-08-11 12:00:19,317 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7100, loss[loss=0.1002, beats_loss=0.01165, ecapa_loss=0.000212, whisper_loss=0.08645, over 23079.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01146, ecapa_loss=0.0001981, whisper_loss=0.09249, over 3889510.53 frames. ], batch size: 97, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:00:27,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-08-11 12:00:28,803 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 12:00:37,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1085570.0, ans=0.0 2024-08-11 12:00:40,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1085570.0, ans=0.125 2024-08-11 12:00:43,129 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-11 12:00:48,661 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:01:02,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1085770.0, ans=0.125 2024-08-11 12:01:03,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1085770.0, ans=0.0 2024-08-11 12:01:04,437 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 30 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 12:01:17,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1085870.0, ans=0.0 2024-08-11 12:01:24,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1085970.0, ans=0.0 2024-08-11 12:01:25,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7150, loss[loss=0.1224, beats_loss=0.009316, ecapa_loss=0.000202, whisper_loss=0.1111, over 22566.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01147, ecapa_loss=0.0001983, whisper_loss=0.09247, over 3911677.13 frames. ], batch size: 87, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:01:29,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+01 2.825e+01 3.133e+01 3.530e+01 6.975e+01, threshold=6.267e+01, percent-clipped=1.0 2024-08-11 12:02:32,426 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7200, loss[loss=0.108, beats_loss=0.01276, ecapa_loss=0.0001481, whisper_loss=0.09375, over 19494.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01147, ecapa_loss=0.0001977, whisper_loss=0.09225, over 3922846.10 frames. ], batch size: 72, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:02:38,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1086470.0, ans=0.125 2024-08-11 12:02:58,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1086670.0, ans=0.0 2024-08-11 12:02:58,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1086670.0, ans=0.0 2024-08-11 12:03:05,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2024-08-11 12:03:17,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1086770.0, ans=0.125 2024-08-11 12:03:28,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1086870.0, ans=0.125 2024-08-11 12:03:31,372 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 12:03:34,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1086870.0, ans=0.125 2024-08-11 12:03:37,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-11 12:03:40,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7250, loss[loss=0.08733, beats_loss=0.01416, ecapa_loss=0.0001716, whisper_loss=0.07146, over 21456.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01155, ecapa_loss=0.0001978, whisper_loss=0.09164, over 3914298.24 frames. ], batch size: 89, lr: 7.90e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:03:44,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.767e+01 3.129e+01 3.597e+01 6.037e+01, threshold=6.257e+01, percent-clipped=0.0 2024-08-11 12:03:46,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2024-08-11 12:03:59,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.44 vs. limit=22.5 2024-08-11 12:04:03,034 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 29 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 12:04:15,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.74 vs. limit=22.5 2024-08-11 12:04:18,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1087170.0, ans=0.0 2024-08-11 12:04:47,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7300, loss[loss=0.1279, beats_loss=0.009321, ecapa_loss=0.0002277, whisper_loss=0.1163, over 22857.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01149, ecapa_loss=0.0001976, whisper_loss=0.09308, over 3955206.29 frames. ], batch size: 91, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:04:52,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1087470.0, ans=0.125 2024-08-11 12:04:54,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1087470.0, ans=0.125 2024-08-11 12:04:54,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1087470.0, ans=0.0 2024-08-11 12:04:55,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1087470.0, ans=0.125 2024-08-11 12:04:57,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1087470.0, ans=0.1 2024-08-11 12:04:58,645 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 37 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-11 12:04:58,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1087470.0, ans=0.125 2024-08-11 12:05:02,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1087570.0, ans=0.125 2024-08-11 12:05:25,936 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 12:05:55,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7350, loss[loss=0.08691, beats_loss=0.01413, ecapa_loss=0.0001597, whisper_loss=0.07118, over 14145.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01139, ecapa_loss=0.0001976, whisper_loss=0.09351, over 3906710.70 frames. ], batch size: 56, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:05:59,592 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.629e+01 2.975e+01 3.413e+01 5.829e+01, threshold=5.951e+01, percent-clipped=0.0 2024-08-11 12:06:02,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1087970.0, ans=0.1 2024-08-11 12:06:05,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1087970.0, ans=0.125 2024-08-11 12:06:11,079 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 12:06:14,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1088070.0, ans=0.05 2024-08-11 12:06:18,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-11 12:06:27,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1088170.0, ans=0.025 2024-08-11 12:06:45,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1088270.0, ans=0.125 2024-08-11 12:06:47,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1088270.0, ans=0.0 2024-08-11 12:07:03,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7400, loss[loss=0.1054, beats_loss=0.01101, ecapa_loss=0.0002288, whisper_loss=0.0921, over 22035.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01148, ecapa_loss=0.0001984, whisper_loss=0.0927, over 3929807.09 frames. ], batch size: 92, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:07:07,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1088470.0, ans=0.1 2024-08-11 12:07:20,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1088570.0, ans=0.0 2024-08-11 12:07:40,806 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 12:07:42,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1088770.0, ans=0.125 2024-08-11 12:07:43,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1088770.0, ans=0.125 2024-08-11 12:07:51,561 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-11 12:08:10,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7450, loss[loss=0.1125, beats_loss=0.009554, ecapa_loss=0.0002094, whisper_loss=0.1008, over 16174.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01136, ecapa_loss=0.0002004, whisper_loss=0.09352, over 3892519.29 frames. ], batch size: 63, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:08:14,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.716e+01 3.101e+01 3.669e+01 6.917e+01, threshold=6.202e+01, percent-clipped=1.0 2024-08-11 12:08:48,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=12.0 2024-08-11 12:09:09,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2024-08-11 12:09:14,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1089370.0, ans=0.1 2024-08-11 12:09:21,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7500, loss[loss=0.1105, beats_loss=0.01114, ecapa_loss=0.0002082, whisper_loss=0.09724, over 17091.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01137, ecapa_loss=0.0001999, whisper_loss=0.09409, over 3916698.72 frames. ], batch size: 68, lr: 7.89e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:09:27,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1089470.0, ans=0.0 2024-08-11 12:09:28,322 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-11 12:09:31,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1089470.0, ans=0.0 2024-08-11 12:09:45,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1089570.0, ans=0.125 2024-08-11 12:09:46,824 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 12:10:15,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1089770.0, ans=0.125 2024-08-11 12:10:16,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1089770.0, ans=0.09899494936611666 2024-08-11 12:10:19,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1089870.0, ans=0.0 2024-08-11 12:10:32,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7550, loss[loss=0.1068, beats_loss=0.00893, ecapa_loss=0.0002397, whisper_loss=0.09546, over 16058.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01136, ecapa_loss=0.0002007, whisper_loss=0.09413, over 3907099.21 frames. ], batch size: 64, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:10:36,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+01 2.654e+01 2.939e+01 3.334e+01 5.450e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-11 12:10:40,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=22.5 2024-08-11 12:10:41,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=54.38 vs. limit=22.5 2024-08-11 12:10:47,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1090070.0, ans=0.2 2024-08-11 12:11:04,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-11 12:11:06,725 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-11 12:11:08,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1090170.0, ans=0.0 2024-08-11 12:11:08,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.29 vs. limit=22.5 2024-08-11 12:11:09,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1090170.0, ans=0.5 2024-08-11 12:11:24,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1090270.0, ans=0.125 2024-08-11 12:11:24,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1090270.0, ans=0.09899494936611666 2024-08-11 12:11:25,703 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 12:11:34,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2024-08-11 12:11:40,996 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 12:11:41,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.39 vs. limit=22.5 2024-08-11 12:11:44,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7600, loss[loss=0.1282, beats_loss=0.007645, ecapa_loss=0.0002129, whisper_loss=0.1185, over 22379.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01137, ecapa_loss=0.0002004, whisper_loss=0.09408, over 3911424.23 frames. ], batch size: 87, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:12:06,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1090570.0, ans=0.125 2024-08-11 12:12:08,806 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 12:12:09,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2024-08-11 12:12:32,243 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 12:12:48,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1090870.0, ans=0.0 2024-08-11 12:12:52,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7650, loss[loss=0.1097, beats_loss=0.00935, ecapa_loss=0.0002408, whisper_loss=0.09794, over 22215.00 frames. ], tot_loss[loss=0.1083, beats_loss=0.01124, ecapa_loss=0.0002009, whisper_loss=0.09509, over 3906150.46 frames. ], batch size: 91, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:12:52,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1090970.0, ans=0.0 2024-08-11 12:12:56,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+01 2.822e+01 3.132e+01 3.571e+01 5.523e+01, threshold=6.263e+01, percent-clipped=0.0 2024-08-11 12:13:18,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1091170.0, ans=0.0 2024-08-11 12:13:18,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1091170.0, ans=0.125 2024-08-11 12:13:20,013 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 12:13:47,883 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 12:13:59,605 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7700, loss[loss=0.1288, beats_loss=0.009892, ecapa_loss=0.0002137, whisper_loss=0.1168, over 19070.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01124, ecapa_loss=0.0002009, whisper_loss=0.09481, over 3913471.59 frames. ], batch size: 74, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:14:02,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2024-08-11 12:14:02,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1091470.0, ans=0.125 2024-08-11 12:14:05,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1091470.0, ans=0.125 2024-08-11 12:14:07,947 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-11 12:14:22,536 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 12:14:25,339 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 12:14:49,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2024-08-11 12:15:03,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.96 vs. limit=15.0 2024-08-11 12:15:05,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7750, loss[loss=0.09474, beats_loss=0.01377, ecapa_loss=0.0001677, whisper_loss=0.07929, over 22478.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01135, ecapa_loss=0.0002002, whisper_loss=0.09375, over 3925699.35 frames. ], batch size: 91, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:15:10,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.105e+01 2.756e+01 3.140e+01 3.838e+01 1.235e+02, threshold=6.279e+01, percent-clipped=2.0 2024-08-11 12:15:22,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1092070.0, ans=0.95 2024-08-11 12:15:36,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1092170.0, ans=0.04949747468305833 2024-08-11 12:15:52,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2024-08-11 12:16:10,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7800, loss[loss=0.09687, beats_loss=0.01227, ecapa_loss=0.0001727, whisper_loss=0.08287, over 20752.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01131, ecapa_loss=0.0002012, whisper_loss=0.09368, over 3935622.20 frames. ], batch size: 81, lr: 7.88e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:16:19,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1092470.0, ans=0.125 2024-08-11 12:16:21,979 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 12:16:46,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1092670.0, ans=15.0 2024-08-11 12:16:48,279 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 12:17:00,238 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-11 12:17:02,737 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 12:17:06,800 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 12:17:07,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1092870.0, ans=0.125 2024-08-11 12:17:09,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1092870.0, ans=0.0 2024-08-11 12:17:17,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7850, loss[loss=0.1055, beats_loss=0.01257, ecapa_loss=0.0002009, whisper_loss=0.09095, over 14452.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01137, ecapa_loss=0.0002004, whisper_loss=0.09297, over 3909325.00 frames. ], batch size: 56, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:17:19,396 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:17:21,540 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+01 2.734e+01 3.036e+01 3.446e+01 5.621e+01, threshold=6.073e+01, percent-clipped=0.0 2024-08-11 12:17:21,698 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 12:17:24,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-08-11 12:17:40,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1093070.0, ans=0.025 2024-08-11 12:17:47,144 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 12:18:07,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1093270.0, ans=0.0 2024-08-11 12:18:07,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-11 12:18:12,915 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.096e+00 2024-08-11 12:18:18,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1093370.0, ans=0.125 2024-08-11 12:18:24,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7900, loss[loss=0.109, beats_loss=0.01256, ecapa_loss=0.0001446, whisper_loss=0.09503, over 23483.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.0001999, whisper_loss=0.09341, over 3891128.59 frames. ], batch size: 87, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:18:27,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1093470.0, ans=0.125 2024-08-11 12:18:35,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1093470.0, ans=0.125 2024-08-11 12:18:40,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1093570.0, ans=0.0 2024-08-11 12:18:51,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1093670.0, ans=0.125 2024-08-11 12:19:08,293 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 12:19:16,039 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 12:19:24,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=12.0 2024-08-11 12:19:30,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 7950, loss[loss=0.1099, beats_loss=0.01036, ecapa_loss=0.0001771, whisper_loss=0.09779, over 16706.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.0113, ecapa_loss=0.0002015, whisper_loss=0.0942, over 3894723.34 frames. ], batch size: 63, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:19:31,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1093970.0, ans=0.125 2024-08-11 12:19:31,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1093970.0, ans=0.125 2024-08-11 12:19:34,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.750e+01 3.082e+01 3.483e+01 5.642e+01, threshold=6.163e+01, percent-clipped=0.0 2024-08-11 12:19:35,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1093970.0, ans=0.125 2024-08-11 12:19:43,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1094070.0, ans=0.04949747468305833 2024-08-11 12:19:45,627 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 12:19:47,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1094070.0, ans=0.125 2024-08-11 12:19:50,060 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 12:19:51,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2024-08-11 12:20:03,435 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 12:20:14,259 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 12:20:16,590 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 12:20:19,398 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 12:20:30,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1094370.0, ans=0.125 2024-08-11 12:20:37,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8000, loss[loss=0.1161, beats_loss=0.01077, ecapa_loss=0.0002246, whisper_loss=0.1031, over 22485.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01128, ecapa_loss=0.0001996, whisper_loss=0.09468, over 3888371.74 frames. ], batch size: 90, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:20:38,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1094470.0, ans=0.125 2024-08-11 12:20:40,398 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 12:20:40,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1094470.0, ans=0.125 2024-08-11 12:20:44,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1094470.0, ans=0.0 2024-08-11 12:20:46,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-11 12:20:51,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1094570.0, ans=0.1 2024-08-11 12:20:56,633 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 12:21:01,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1094570.0, ans=0.0 2024-08-11 12:21:05,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1094670.0, ans=0.125 2024-08-11 12:21:10,486 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 12:21:13,226 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-11 12:21:14,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1094670.0, ans=0.125 2024-08-11 12:21:23,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1094770.0, ans=0.125 2024-08-11 12:21:30,302 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 12:21:44,771 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8050, loss[loss=0.1174, beats_loss=0.008605, ecapa_loss=0.0002114, whisper_loss=0.1067, over 15667.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01129, ecapa_loss=0.0001995, whisper_loss=0.0942, over 3882521.43 frames. ], batch size: 63, lr: 7.87e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:21:48,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.683e+01 3.112e+01 3.562e+01 5.362e+01, threshold=6.224e+01, percent-clipped=0.0 2024-08-11 12:22:02,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1095070.0, ans=0.2 2024-08-11 12:22:02,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1095070.0, ans=0.0 2024-08-11 12:22:04,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1095070.0, ans=0.04949747468305833 2024-08-11 12:22:08,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1095070.0, ans=0.0 2024-08-11 12:22:10,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1095070.0, ans=0.125 2024-08-11 12:22:31,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-11 12:22:33,979 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 12:22:37,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2024-08-11 12:22:44,815 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:22:52,174 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8100, loss[loss=0.1017, beats_loss=0.01304, ecapa_loss=0.0001972, whisper_loss=0.08671, over 21873.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01131, ecapa_loss=0.000199, whisper_loss=0.09344, over 3859441.87 frames. ], batch size: 92, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:23:02,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=12.0 2024-08-11 12:23:03,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=8.0 2024-08-11 12:23:18,886 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 12:23:20,120 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 12:23:26,842 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 12:23:41,797 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 12:23:43,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1095770.0, ans=0.1 2024-08-11 12:23:53,857 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 12:23:58,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8150, loss[loss=0.08058, beats_loss=0.01093, ecapa_loss=0.0002119, whisper_loss=0.06753, over 14535.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01132, ecapa_loss=0.0001991, whisper_loss=0.09365, over 3872755.67 frames. ], batch size: 59, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:24:03,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.662e+01 2.951e+01 3.382e+01 5.794e+01, threshold=5.903e+01, percent-clipped=0.0 2024-08-11 12:24:10,034 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 12:24:25,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1096170.0, ans=0.2 2024-08-11 12:24:27,751 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 12:24:59,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2024-08-11 12:25:04,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1096370.0, ans=0.0 2024-08-11 12:25:05,247 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 12:25:06,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8200, loss[loss=0.08927, beats_loss=0.01226, ecapa_loss=0.0001921, whisper_loss=0.07509, over 21210.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01128, ecapa_loss=0.0002008, whisper_loss=0.09348, over 3874215.85 frames. ], batch size: 86, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:25:08,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1096470.0, ans=10.0 2024-08-11 12:25:09,348 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 12:25:17,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.01 vs. limit=22.5 2024-08-11 12:25:22,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1096570.0, ans=0.125 2024-08-11 12:25:47,973 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-11 12:26:05,703 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-11 12:26:07,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1096870.0, ans=0.0 2024-08-11 12:26:12,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8250, loss[loss=0.1028, beats_loss=0.009398, ecapa_loss=0.0002244, whisper_loss=0.09115, over 16927.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01131, ecapa_loss=0.0002011, whisper_loss=0.09332, over 3899039.65 frames. ], batch size: 65, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:26:13,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1096970.0, ans=0.0 2024-08-11 12:26:16,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.782e+01 3.103e+01 3.474e+01 6.879e+01, threshold=6.206e+01, percent-clipped=1.0 2024-08-11 12:26:23,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1096970.0, ans=0.0 2024-08-11 12:26:24,339 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 12:26:27,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1097070.0, ans=0.0 2024-08-11 12:26:35,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1097070.0, ans=0.0 2024-08-11 12:26:42,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-11 12:26:44,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1097170.0, ans=0.5 2024-08-11 12:26:55,036 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 12:26:56,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1097270.0, ans=0.2 2024-08-11 12:27:05,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1097370.0, ans=22.5 2024-08-11 12:27:09,387 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 12:27:13,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.48 vs. limit=22.5 2024-08-11 12:27:18,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1097470.0, ans=0.125 2024-08-11 12:27:19,904 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8300, loss[loss=0.0822, beats_loss=0.01446, ecapa_loss=0.0002077, whisper_loss=0.06567, over 13472.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01132, ecapa_loss=0.0001995, whisper_loss=0.09287, over 3905966.71 frames. ], batch size: 56, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:27:35,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1097570.0, ans=0.125 2024-08-11 12:27:47,621 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 12:28:00,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-11 12:28:07,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1097770.0, ans=0.04949747468305833 2024-08-11 12:28:09,130 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 12:28:26,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8350, loss[loss=0.09974, beats_loss=0.01118, ecapa_loss=0.000199, whisper_loss=0.08656, over 17997.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0113, ecapa_loss=0.0002003, whisper_loss=0.0931, over 3906102.03 frames. ], batch size: 71, lr: 7.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:28:30,520 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.714e+01 3.261e+01 3.683e+01 6.544e+01, threshold=6.523e+01, percent-clipped=1.0 2024-08-11 12:28:48,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2024-08-11 12:28:54,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.87 vs. limit=22.5 2024-08-11 12:28:57,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=15.0 2024-08-11 12:29:13,888 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 12:29:20,767 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 30 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 12:29:28,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1098370.0, ans=0.05 2024-08-11 12:29:34,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8400, loss[loss=0.1014, beats_loss=0.01306, ecapa_loss=0.0001776, whisper_loss=0.0866, over 20412.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0113, ecapa_loss=0.0002013, whisper_loss=0.09325, over 3898606.57 frames. ], batch size: 80, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:30:01,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1098670.0, ans=0.0 2024-08-11 12:30:03,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1098670.0, ans=0.04949747468305833 2024-08-11 12:30:09,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1098670.0, ans=0.125 2024-08-11 12:30:18,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1098770.0, ans=0.1 2024-08-11 12:30:26,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1098870.0, ans=0.07 2024-08-11 12:30:40,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8450, loss[loss=0.1076, beats_loss=0.009278, ecapa_loss=0.0002063, whisper_loss=0.09626, over 18687.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01126, ecapa_loss=0.000201, whisper_loss=0.09407, over 3909114.71 frames. ], batch size: 72, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:30:41,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1098970.0, ans=0.0 2024-08-11 12:30:42,007 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 12:30:44,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.709e+01 3.054e+01 3.505e+01 4.740e+01, threshold=6.108e+01, percent-clipped=0.0 2024-08-11 12:30:47,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1098970.0, ans=0.0 2024-08-11 12:31:01,114 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 12:31:41,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1099370.0, ans=0.0 2024-08-11 12:31:46,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8500, loss[loss=0.08903, beats_loss=0.009909, ecapa_loss=0.0001839, whisper_loss=0.07728, over 16619.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01135, ecapa_loss=0.0001986, whisper_loss=0.09368, over 3901769.73 frames. ], batch size: 61, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:32:31,153 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 12:32:31,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=1099770.0, ans=15.0 2024-08-11 12:32:34,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1099770.0, ans=0.0 2024-08-11 12:32:46,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1099870.0, ans=0.125 2024-08-11 12:32:56,043 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8550, loss[loss=0.09203, beats_loss=0.009563, ecapa_loss=0.0002782, whisper_loss=0.07969, over 16760.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01136, ecapa_loss=0.0001987, whisper_loss=0.09319, over 3912544.88 frames. ], batch size: 72, lr: 7.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 12:32:56,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1099970.0, ans=0.125 2024-08-11 12:33:00,565 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.728e+01 3.009e+01 3.613e+01 5.860e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 12:33:14,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2024-08-11 12:33:22,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1100070.0, ans=0.015 2024-08-11 12:33:22,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1100070.0, ans=0.125 2024-08-11 12:33:25,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1100170.0, ans=0.125 2024-08-11 12:33:33,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1100170.0, ans=0.2 2024-08-11 12:33:35,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1100170.0, ans=0.1 2024-08-11 12:33:39,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-08-11 12:33:49,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1100270.0, ans=0.125 2024-08-11 12:33:57,116 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 12:34:01,565 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 12:34:07,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-11 12:34:08,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1100370.0, ans=0.1 2024-08-11 12:34:11,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8600, loss[loss=0.109, beats_loss=0.012, ecapa_loss=0.0001948, whisper_loss=0.09505, over 20204.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01133, ecapa_loss=0.0001994, whisper_loss=0.09297, over 3893699.92 frames. ], batch size: 80, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:34:22,356 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 12:35:13,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1100870.0, ans=0.125 2024-08-11 12:35:26,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8650, loss[loss=0.1198, beats_loss=0.00894, ecapa_loss=0.0002602, whisper_loss=0.1083, over 21014.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01134, ecapa_loss=0.0002001, whisper_loss=0.09298, over 3863642.68 frames. ], batch size: 85, lr: 7.85e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:35:31,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.631e+01 2.958e+01 3.559e+01 6.258e+01, threshold=5.915e+01, percent-clipped=1.0 2024-08-11 12:35:40,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1101070.0, ans=0.0 2024-08-11 12:35:41,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1101070.0, ans=0.2 2024-08-11 12:35:56,038 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 12:36:02,206 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-11 12:36:12,379 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 12:36:16,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1101270.0, ans=0.125 2024-08-11 12:36:20,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1101270.0, ans=0.125 2024-08-11 12:36:30,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1101370.0, ans=0.2 2024-08-11 12:36:47,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8700, loss[loss=0.1024, beats_loss=0.01176, ecapa_loss=0.0001896, whisper_loss=0.08877, over 18668.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01125, ecapa_loss=0.0002002, whisper_loss=0.09425, over 3875073.49 frames. ], batch size: 76, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:36:53,474 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 12:37:05,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-11 12:37:34,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2024-08-11 12:37:47,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1101770.0, ans=0.125 2024-08-11 12:37:47,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1101770.0, ans=0.125 2024-08-11 12:38:11,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8750, loss[loss=0.1053, beats_loss=0.01022, ecapa_loss=0.0002107, whisper_loss=0.093, over 13827.00 frames. ], tot_loss[loss=0.107, beats_loss=0.0112, ecapa_loss=0.0002016, whisper_loss=0.09383, over 3868314.78 frames. ], batch size: 53, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:38:14,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1101970.0, ans=0.025 2024-08-11 12:38:15,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.807e+01 3.199e+01 3.848e+01 5.840e+01, threshold=6.398e+01, percent-clipped=0.0 2024-08-11 12:38:27,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-11 12:38:32,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1102070.0, ans=0.04949747468305833 2024-08-11 12:38:51,940 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-11 12:38:52,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1102170.0, ans=0.125 2024-08-11 12:39:17,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1102370.0, ans=0.2 2024-08-11 12:39:24,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-08-11 12:39:25,817 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8800, loss[loss=0.09405, beats_loss=0.01147, ecapa_loss=0.0001805, whisper_loss=0.08077, over 15693.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01128, ecapa_loss=0.0002009, whisper_loss=0.09369, over 3873340.59 frames. ], batch size: 62, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:39:33,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1102470.0, ans=0.5 2024-08-11 12:39:38,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2024-08-11 12:39:58,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2024-08-11 12:40:05,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1102670.0, ans=0.0 2024-08-11 12:40:11,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1102670.0, ans=0.125 2024-08-11 12:40:18,507 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 12:40:44,033 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8850, loss[loss=0.1048, beats_loss=0.01165, ecapa_loss=0.0002055, whisper_loss=0.09109, over 17539.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0114, ecapa_loss=0.0002001, whisper_loss=0.09313, over 3860041.13 frames. ], batch size: 71, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:40:48,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.779e+01 3.220e+01 3.967e+01 6.531e+01, threshold=6.439e+01, percent-clipped=1.0 2024-08-11 12:40:48,560 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 12:41:09,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1103070.0, ans=0.04949747468305833 2024-08-11 12:41:19,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1103170.0, ans=0.125 2024-08-11 12:41:26,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1103170.0, ans=0.125 2024-08-11 12:41:42,802 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 12:41:51,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1103370.0, ans=0.125 2024-08-11 12:41:59,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1103370.0, ans=0.0 2024-08-11 12:42:00,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1103370.0, ans=0.2 2024-08-11 12:42:06,556 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8900, loss[loss=0.1089, beats_loss=0.01186, ecapa_loss=0.0001899, whisper_loss=0.0951, over 21452.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01139, ecapa_loss=0.0001998, whisper_loss=0.09336, over 3848514.69 frames. ], batch size: 89, lr: 7.84e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:42:11,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1103470.0, ans=0.125 2024-08-11 12:42:33,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1103570.0, ans=0.0 2024-08-11 12:42:34,844 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 12:42:36,140 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 12:42:36,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2024-08-11 12:42:39,550 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 12:42:46,736 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 12:43:03,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1103770.0, ans=0.125 2024-08-11 12:43:24,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 8950, loss[loss=0.1089, beats_loss=0.01063, ecapa_loss=0.000219, whisper_loss=0.0961, over 22320.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01141, ecapa_loss=0.0001988, whisper_loss=0.09333, over 3860307.48 frames. ], batch size: 90, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:43:28,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.747e+01 3.145e+01 3.619e+01 5.572e+01, threshold=6.290e+01, percent-clipped=0.0 2024-08-11 12:43:30,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1103970.0, ans=0.1 2024-08-11 12:43:45,910 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-11 12:44:03,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1104170.0, ans=0.0 2024-08-11 12:44:06,855 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 12:44:39,424 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9000, loss[loss=0.1325, beats_loss=0.008658, ecapa_loss=0.0002086, whisper_loss=0.1217, over 20701.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01145, ecapa_loss=0.0001985, whisper_loss=0.09248, over 3855365.12 frames. ], batch size: 81, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:44:39,425 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 12:45:15,327 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on ASR_libri: loss=0.2575, beats_loss=0, ecapa_loss=0.0006551, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 12:45:34,140 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on SV_voxceleb1: loss=0.005315, beats_loss=0, ecapa_loss=0.0005315, whisper_loss=0, over 939242.00 frames. 2024-08-11 12:45:43,032 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.1806, 2.9230, 3.0167, 3.1143], device='cuda:0') 2024-08-11 12:46:16,705 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6059, 3.2367, 1.9853, 1.4922], device='cuda:0') 2024-08-11 12:46:28,678 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2662, 3.5987, 2.3590, 3.9366], device='cuda:0') 2024-08-11 12:47:19,753 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on AT_audioset: loss=0.02529, beats_loss=0.02529, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 12:47:19,758 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 12:47:43,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1104570.0, ans=0.0 2024-08-11 12:47:55,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1104670.0, ans=0.0 2024-08-11 12:47:57,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1104670.0, ans=0.125 2024-08-11 12:48:02,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-08-11 12:48:04,824 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 12:48:06,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1104770.0, ans=0.1 2024-08-11 12:48:08,923 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 12:48:14,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1104770.0, ans=0.0 2024-08-11 12:48:15,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1104770.0, ans=0.125 2024-08-11 12:48:32,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1104870.0, ans=0.1 2024-08-11 12:48:36,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9050, loss[loss=0.09788, beats_loss=0.01185, ecapa_loss=0.0001887, whisper_loss=0.08414, over 22704.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01146, ecapa_loss=0.0001984, whisper_loss=0.09209, over 3845166.04 frames. ], batch size: 92, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:48:38,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1104970.0, ans=0.0 2024-08-11 12:48:41,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.751e+01 3.167e+01 3.446e+01 7.186e+01, threshold=6.334e+01, percent-clipped=1.0 2024-08-11 12:48:44,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1104970.0, ans=0.05 2024-08-11 12:48:50,105 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 12:49:12,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1105170.0, ans=0.1 2024-08-11 12:49:13,832 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-11 12:49:17,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1105170.0, ans=0.2 2024-08-11 12:49:21,813 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 12:49:30,430 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 12:49:43,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1105370.0, ans=0.125 2024-08-11 12:49:53,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9100, loss[loss=0.08922, beats_loss=0.01304, ecapa_loss=0.0001864, whisper_loss=0.07432, over 22163.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01126, ecapa_loss=0.0002, whisper_loss=0.09311, over 3839322.13 frames. ], batch size: 91, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:49:55,283 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 12:49:58,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1105470.0, ans=0.125 2024-08-11 12:50:01,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1105470.0, ans=0.1 2024-08-11 12:50:22,753 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 12:50:25,529 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-11 12:50:29,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1105670.0, ans=0.125 2024-08-11 12:50:31,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1105670.0, ans=0.125 2024-08-11 12:50:42,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1105770.0, ans=0.125 2024-08-11 12:50:42,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1105770.0, ans=0.0 2024-08-11 12:51:00,653 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 12:51:01,990 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 12:51:05,261 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 12:51:10,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9150, loss[loss=0.1119, beats_loss=0.009493, ecapa_loss=0.0001921, whisper_loss=0.1005, over 22517.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01137, ecapa_loss=0.0001981, whisper_loss=0.09313, over 3853298.38 frames. ], batch size: 88, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:51:11,899 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 12:51:14,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.723e+01 3.003e+01 3.393e+01 4.790e+01, threshold=6.006e+01, percent-clipped=0.0 2024-08-11 12:51:33,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1106070.0, ans=0.125 2024-08-11 12:51:34,936 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=4.817e-02 2024-08-11 12:51:44,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1106170.0, ans=0.2 2024-08-11 12:51:47,197 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 12:52:02,254 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 12:52:02,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1106270.0, ans=0.125 2024-08-11 12:52:14,180 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-11 12:52:20,348 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.783e-01 2024-08-11 12:52:25,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9200, loss[loss=0.1108, beats_loss=0.01133, ecapa_loss=0.0001585, whisper_loss=0.09784, over 19623.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01131, ecapa_loss=0.0001989, whisper_loss=0.09354, over 3881283.20 frames. ], batch size: 75, lr: 7.83e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:52:31,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1106470.0, ans=0.125 2024-08-11 12:53:14,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-11 12:53:15,751 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 12:53:17,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.61 vs. limit=15.0 2024-08-11 12:53:23,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1106770.0, ans=0.125 2024-08-11 12:53:24,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1106770.0, ans=0.0 2024-08-11 12:53:33,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1106870.0, ans=0.1 2024-08-11 12:53:36,143 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 12:53:42,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9250, loss[loss=0.09891, beats_loss=0.01353, ecapa_loss=0.0002036, whisper_loss=0.08335, over 21188.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01142, ecapa_loss=0.0001988, whisper_loss=0.09284, over 3902363.56 frames. ], batch size: 90, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:53:47,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.771e+01 3.106e+01 3.599e+01 1.159e+02, threshold=6.212e+01, percent-clipped=1.0 2024-08-11 12:54:02,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1107070.0, ans=0.04949747468305833 2024-08-11 12:54:09,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1107070.0, ans=0.0 2024-08-11 12:54:12,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1107170.0, ans=0.1 2024-08-11 12:54:24,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2024-08-11 12:54:29,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1107270.0, ans=0.07 2024-08-11 12:54:37,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1107270.0, ans=0.1 2024-08-11 12:54:46,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-11 12:54:57,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9300, loss[loss=0.1088, beats_loss=0.00887, ecapa_loss=0.0001936, whisper_loss=0.09798, over 21949.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01127, ecapa_loss=0.0001993, whisper_loss=0.09408, over 3918643.40 frames. ], batch size: 87, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:55:09,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1107470.0, ans=0.2 2024-08-11 12:55:10,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2024-08-11 12:55:35,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1107670.0, ans=0.125 2024-08-11 12:55:39,387 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 12:55:47,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1107770.0, ans=0.0 2024-08-11 12:56:00,458 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 12:56:06,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1107870.0, ans=0.1 2024-08-11 12:56:12,521 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9350, loss[loss=0.09555, beats_loss=0.01395, ecapa_loss=0.000184, whisper_loss=0.07977, over 21873.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001998, whisper_loss=0.09318, over 3895969.12 frames. ], batch size: 89, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:56:17,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.772e+01 2.988e+01 3.438e+01 1.215e+02, threshold=5.975e+01, percent-clipped=1.0 2024-08-11 12:56:26,528 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 12:56:26,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1107970.0, ans=0.125 2024-08-11 12:56:29,792 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 12:56:42,232 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-11 12:56:58,388 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 12:57:00,057 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 12:57:28,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9400, loss[loss=0.122, beats_loss=0.009187, ecapa_loss=0.0002204, whisper_loss=0.1107, over 19346.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.0113, ecapa_loss=0.0001999, whisper_loss=0.0938, over 3915179.09 frames. ], batch size: 74, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:57:30,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1108470.0, ans=0.125 2024-08-11 12:57:41,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1108470.0, ans=0.125 2024-08-11 12:57:45,565 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 12:57:47,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2024-08-11 12:57:58,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1108670.0, ans=0.125 2024-08-11 12:58:01,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1108670.0, ans=0.125 2024-08-11 12:58:45,582 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9450, loss[loss=0.1205, beats_loss=0.008833, ecapa_loss=0.0002341, whisper_loss=0.1094, over 22811.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01132, ecapa_loss=0.0001996, whisper_loss=0.09305, over 3921653.25 frames. ], batch size: 91, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 12:58:47,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1108970.0, ans=0.2 2024-08-11 12:58:50,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.669e+01 3.064e+01 3.549e+01 5.554e+01, threshold=6.127e+01, percent-clipped=0.0 2024-08-11 12:58:50,564 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 12:58:55,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1108970.0, ans=0.2 2024-08-11 12:58:59,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1109070.0, ans=0.1 2024-08-11 12:59:08,535 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 8 from Vox, 41 fro AS 2024-08-11 12:59:15,876 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 12:59:34,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1109270.0, ans=0.2 2024-08-11 12:59:44,698 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=8.209e-03 2024-08-11 12:59:45,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1109370.0, ans=0.125 2024-08-11 12:59:47,009 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 12:59:48,293 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 12:59:52,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-11 13:00:00,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9500, loss[loss=0.1065, beats_loss=0.009131, ecapa_loss=0.0001666, whisper_loss=0.09566, over 14787.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01138, ecapa_loss=0.0002004, whisper_loss=0.09253, over 3905218.51 frames. ], batch size: 55, lr: 7.82e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:00:08,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2024-08-11 13:00:26,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1109570.0, ans=0.0 2024-08-11 13:00:46,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1109770.0, ans=0.125 2024-08-11 13:00:58,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1109870.0, ans=0.125 2024-08-11 13:01:13,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9550, loss[loss=0.1061, beats_loss=0.01139, ecapa_loss=0.0001757, whisper_loss=0.09299, over 20791.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01138, ecapa_loss=0.0001999, whisper_loss=0.09239, over 3905150.89 frames. ], batch size: 82, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:01:18,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.581e+01 3.097e+01 3.550e+01 5.814e+01, threshold=6.195e+01, percent-clipped=0.0 2024-08-11 13:01:22,951 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 13:01:34,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1110070.0, ans=0.125 2024-08-11 13:01:44,078 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 13:02:15,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1110370.0, ans=0.0 2024-08-11 13:02:18,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1110370.0, ans=0.125 2024-08-11 13:02:24,881 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 13:02:27,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1110470.0, ans=0.125 2024-08-11 13:02:28,569 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9600, loss[loss=0.1233, beats_loss=0.01062, ecapa_loss=0.000174, whisper_loss=0.111, over 17173.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01133, ecapa_loss=0.0001986, whisper_loss=0.09289, over 3886760.19 frames. ], batch size: 66, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:02:40,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110470.0, ans=0.1 2024-08-11 13:02:45,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1110570.0, ans=0.2 2024-08-11 13:03:03,214 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 6 from Vox, 27 fro AS 2024-08-11 13:03:11,813 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 13:03:18,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1110770.0, ans=0.125 2024-08-11 13:03:37,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1110870.0, ans=0.1 2024-08-11 13:03:39,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9650, loss[loss=0.1072, beats_loss=0.009879, ecapa_loss=0.0001844, whisper_loss=0.09551, over 19353.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01125, ecapa_loss=0.0001992, whisper_loss=0.09364, over 3857932.35 frames. ], batch size: 75, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:03:42,393 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 13:03:43,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.716e+01 3.002e+01 3.574e+01 5.577e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 13:04:11,117 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 13:04:26,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1111270.0, ans=0.1 2024-08-11 13:04:46,557 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 13:04:50,644 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9700, loss[loss=0.09499, beats_loss=0.01159, ecapa_loss=0.0001646, whisper_loss=0.08175, over 19782.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01128, ecapa_loss=0.0001998, whisper_loss=0.09297, over 3851969.31 frames. ], batch size: 79, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:04:55,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1111470.0, ans=0.035 2024-08-11 13:05:00,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1111470.0, ans=0.125 2024-08-11 13:05:02,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1111470.0, ans=0.0 2024-08-11 13:05:08,157 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 13:05:08,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1111570.0, ans=0.2 2024-08-11 13:05:21,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1111670.0, ans=0.125 2024-08-11 13:05:23,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1111670.0, ans=0.125 2024-08-11 13:05:24,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1111670.0, ans=0.0 2024-08-11 13:05:38,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1111770.0, ans=0.07 2024-08-11 13:05:44,056 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 13:05:44,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1111770.0, ans=0.1 2024-08-11 13:05:54,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1111870.0, ans=0.1 2024-08-11 13:06:03,956 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9750, loss[loss=0.09553, beats_loss=0.01196, ecapa_loss=0.0002001, whisper_loss=0.08157, over 21227.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01125, ecapa_loss=0.0001998, whisper_loss=0.09248, over 3829608.81 frames. ], batch size: 88, lr: 7.81e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:06:08,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.596e+01 2.916e+01 3.374e+01 5.743e+01, threshold=5.832e+01, percent-clipped=0.0 2024-08-11 13:06:08,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=12.0 2024-08-11 13:06:11,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1111970.0, ans=0.125 2024-08-11 13:06:22,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-11 13:06:22,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2024-08-11 13:06:26,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2024-08-11 13:06:40,034 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-11 13:06:40,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1112170.0, ans=0.1 2024-08-11 13:07:06,436 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 13:07:17,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9800, loss[loss=0.1025, beats_loss=0.01125, ecapa_loss=0.0002084, whisper_loss=0.0892, over 21025.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01126, ecapa_loss=0.0001991, whisper_loss=0.09286, over 3856736.32 frames. ], batch size: 84, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:07:24,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-11 13:07:59,583 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 13:08:01,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1112770.0, ans=0.0 2024-08-11 13:08:32,901 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9850, loss[loss=0.1067, beats_loss=0.01219, ecapa_loss=0.0001893, whisper_loss=0.09263, over 15968.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01129, ecapa_loss=0.0001997, whisper_loss=0.09324, over 3862618.24 frames. ], batch size: 63, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:08:37,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.640e+01 2.920e+01 3.284e+01 5.372e+01, threshold=5.839e+01, percent-clipped=0.0 2024-08-11 13:08:37,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1112970.0, ans=0.0 2024-08-11 13:08:49,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1113070.0, ans=0.0 2024-08-11 13:08:49,264 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.362e-01 2024-08-11 13:08:50,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1113070.0, ans=0.07 2024-08-11 13:09:07,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1113170.0, ans=0.125 2024-08-11 13:09:09,291 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 13:09:09,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1113170.0, ans=0.07 2024-08-11 13:09:10,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1113170.0, ans=0.1 2024-08-11 13:09:42,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1113370.0, ans=0.0 2024-08-11 13:09:50,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9900, loss[loss=0.1061, beats_loss=0.01089, ecapa_loss=0.0002006, whisper_loss=0.09317, over 17530.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01129, ecapa_loss=0.0001993, whisper_loss=0.09324, over 3877580.83 frames. ], batch size: 69, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:09:51,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-11 13:09:55,895 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 13:09:56,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1113470.0, ans=0.125 2024-08-11 13:09:58,684 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 13:10:02,262 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 13:10:08,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1113570.0, ans=0.0 2024-08-11 13:10:09,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2024-08-11 13:10:31,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1113670.0, ans=0.125 2024-08-11 13:10:44,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1113770.0, ans=0.125 2024-08-11 13:10:51,035 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 13:10:59,405 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 34 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 13:11:12,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-08-11 13:11:16,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1113970.0, ans=0.125 2024-08-11 13:11:18,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 9950, loss[loss=0.09588, beats_loss=0.01118, ecapa_loss=0.0002532, whisper_loss=0.08217, over 18540.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01127, ecapa_loss=0.0001999, whisper_loss=0.09399, over 3888289.25 frames. ], batch size: 81, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:11:24,433 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.684e+01 2.921e+01 3.407e+01 1.322e+02, threshold=5.842e+01, percent-clipped=4.0 2024-08-11 13:11:36,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1114070.0, ans=0.04949747468305833 2024-08-11 13:11:39,584 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 44 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 13:11:44,256 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 13:12:50,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10000, loss[loss=0.09241, beats_loss=0.01305, ecapa_loss=0.0001966, whisper_loss=0.0774, over 19098.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01118, ecapa_loss=0.0001999, whisper_loss=0.09434, over 3871170.91 frames. ], batch size: 79, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:12:59,273 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 13:13:01,577 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-11 13:13:07,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1114570.0, ans=0.2 2024-08-11 13:13:24,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1114570.0, ans=0.0 2024-08-11 13:13:31,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1114670.0, ans=0.125 2024-08-11 13:13:52,225 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-11 13:14:06,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-11 13:14:10,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1114870.0, ans=0.125 2024-08-11 13:14:20,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10050, loss[loss=0.1115, beats_loss=0.008951, ecapa_loss=0.0002298, whisper_loss=0.1002, over 18721.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01116, ecapa_loss=0.0001996, whisper_loss=0.09416, over 3871171.76 frames. ], batch size: 78, lr: 7.80e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:14:21,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1114970.0, ans=0.0 2024-08-11 13:14:26,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.703e+01 2.998e+01 3.429e+01 6.033e+01, threshold=5.996e+01, percent-clipped=1.0 2024-08-11 13:14:28,451 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 13:15:04,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1115170.0, ans=0.2 2024-08-11 13:15:12,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115170.0, ans=0.1 2024-08-11 13:15:28,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1115270.0, ans=0.125 2024-08-11 13:15:35,432 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 13:15:57,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10100, loss[loss=0.09121, beats_loss=0.01129, ecapa_loss=0.0001903, whisper_loss=0.07802, over 21887.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01113, ecapa_loss=0.0002007, whisper_loss=0.09439, over 3889687.69 frames. ], batch size: 90, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:16:02,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-08-11 13:16:35,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1115570.0, ans=0.125 2024-08-11 13:16:48,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1115670.0, ans=0.125 2024-08-11 13:16:53,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-11 13:16:58,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-11 13:17:04,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1115770.0, ans=15.0 2024-08-11 13:17:07,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1115770.0, ans=0.025 2024-08-11 13:17:16,560 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 13:17:26,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1115870.0, ans=0.125 2024-08-11 13:17:45,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10150, loss[loss=0.1255, beats_loss=0.009278, ecapa_loss=0.0002169, whisper_loss=0.114, over 20293.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01124, ecapa_loss=0.0001991, whisper_loss=0.0938, over 3873623.46 frames. ], batch size: 79, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:17:45,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1115970.0, ans=0.125 2024-08-11 13:17:49,648 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.757e+01 3.072e+01 3.612e+01 1.119e+02, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:17:49,877 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 13:17:51,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-11 13:17:53,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2024-08-11 13:18:27,188 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 13:18:52,119 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 13:18:52,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1116370.0, ans=0.2 2024-08-11 13:19:00,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10200, loss[loss=0.07162, beats_loss=0.01399, ecapa_loss=0.0002148, whisper_loss=0.05548, over 14044.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01121, ecapa_loss=0.0001995, whisper_loss=0.09435, over 3876501.46 frames. ], batch size: 60, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:19:02,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1116470.0, ans=0.125 2024-08-11 13:19:30,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=15.0 2024-08-11 13:19:40,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1116670.0, ans=0.125 2024-08-11 13:19:58,554 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 38 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 13:20:00,409 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 13:20:19,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10250, loss[loss=0.09991, beats_loss=0.01164, ecapa_loss=0.0001796, whisper_loss=0.08647, over 18439.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01121, ecapa_loss=0.0001983, whisper_loss=0.09434, over 3888116.85 frames. ], batch size: 76, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:20:23,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.168e+01 2.664e+01 3.001e+01 3.567e+01 5.136e+01, threshold=6.003e+01, percent-clipped=0.0 2024-08-11 13:20:32,763 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 13:20:51,984 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 13:20:55,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1117170.0, ans=0.0 2024-08-11 13:21:40,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10300, loss[loss=0.0965, beats_loss=0.01187, ecapa_loss=0.0001911, whisper_loss=0.08271, over 22777.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01133, ecapa_loss=0.0001982, whisper_loss=0.09314, over 3885490.48 frames. ], batch size: 93, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:21:46,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1117470.0, ans=0.125 2024-08-11 13:21:53,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1117470.0, ans=0.2 2024-08-11 13:21:55,132 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-11 13:22:00,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1117570.0, ans=0.0 2024-08-11 13:22:40,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1117770.0, ans=0.2 2024-08-11 13:22:53,011 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 13:23:01,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10350, loss[loss=0.1111, beats_loss=0.0118, ecapa_loss=0.0001952, whisper_loss=0.09732, over 19882.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01135, ecapa_loss=0.0001983, whisper_loss=0.09325, over 3880463.78 frames. ], batch size: 79, lr: 7.79e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:23:06,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.796e+01 3.108e+01 3.786e+01 6.316e+01, threshold=6.215e+01, percent-clipped=1.0 2024-08-11 13:23:08,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1117970.0, ans=0.0 2024-08-11 13:23:13,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2024-08-11 13:23:14,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1117970.0, ans=0.2 2024-08-11 13:23:37,364 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 13:23:46,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1118270.0, ans=0.125 2024-08-11 13:23:52,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1118270.0, ans=0.125 2024-08-11 13:24:01,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1118270.0, ans=0.125 2024-08-11 13:24:06,457 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 13:24:07,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2024-08-11 13:24:15,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1118370.0, ans=0.125 2024-08-11 13:24:18,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10400, loss[loss=0.09529, beats_loss=0.01381, ecapa_loss=0.0001909, whisper_loss=0.07958, over 21921.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01138, ecapa_loss=0.0001977, whisper_loss=0.09283, over 3880021.73 frames. ], batch size: 91, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:24:20,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-08-11 13:24:26,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1118470.0, ans=0.0 2024-08-11 13:24:29,266 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 13:24:35,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1118570.0, ans=0.125 2024-08-11 13:24:41,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1118570.0, ans=0.0 2024-08-11 13:24:41,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1118570.0, ans=0.125 2024-08-11 13:25:04,916 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 41 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 13:25:19,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-08-11 13:25:25,120 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 13:25:30,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.93 vs. limit=22.5 2024-08-11 13:25:35,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10450, loss[loss=0.09352, beats_loss=0.0113, ecapa_loss=0.0001976, whisper_loss=0.08025, over 18931.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01136, ecapa_loss=0.0001977, whisper_loss=0.09329, over 3862500.54 frames. ], batch size: 76, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:25:39,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.711e+01 3.019e+01 3.517e+01 4.993e+01, threshold=6.039e+01, percent-clipped=0.0 2024-08-11 13:25:41,455 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 13:25:44,592 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.677e+05 2024-08-11 13:25:53,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1119070.0, ans=0.125 2024-08-11 13:25:59,674 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 17 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-11 13:26:03,088 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 13:26:03,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.39 vs. limit=22.5 2024-08-11 13:26:05,816 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 13:26:12,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1119170.0, ans=0.5 2024-08-11 13:26:21,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1119270.0, ans=0.125 2024-08-11 13:26:32,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1119270.0, ans=0.125 2024-08-11 13:26:49,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1119370.0, ans=0.125 2024-08-11 13:26:49,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-11 13:26:53,841 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10500, loss[loss=0.1205, beats_loss=0.009039, ecapa_loss=0.000193, whisper_loss=0.1095, over 21108.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01121, ecapa_loss=0.0001976, whisper_loss=0.09406, over 3870832.16 frames. ], batch size: 80, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:26:55,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=22.5 2024-08-11 13:27:10,352 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 13:27:32,136 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 13:27:59,072 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 13:28:08,706 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 13:28:11,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10550, loss[loss=0.1057, beats_loss=0.01126, ecapa_loss=0.0001834, whisper_loss=0.09256, over 13590.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01115, ecapa_loss=0.0001986, whisper_loss=0.09443, over 3882443.44 frames. ], batch size: 54, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:28:11,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1119970.0, ans=0.125 2024-08-11 13:28:14,155 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-112000.pt 2024-08-11 13:28:17,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.650e+01 3.072e+01 3.667e+01 9.491e+01, threshold=6.144e+01, percent-clipped=1.0 2024-08-11 13:28:19,202 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 13:28:19,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-11 13:28:20,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1119970.0, ans=0.1 2024-08-11 13:28:31,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1120070.0, ans=0.015 2024-08-11 13:28:36,576 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 13:28:38,396 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-11 13:28:43,014 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-11 13:28:48,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1120170.0, ans=0.5 2024-08-11 13:28:56,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1120170.0, ans=0.1 2024-08-11 13:29:01,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1120270.0, ans=0.5 2024-08-11 13:29:02,479 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 13:29:06,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1120270.0, ans=0.125 2024-08-11 13:29:09,783 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 13:29:21,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1120370.0, ans=0.125 2024-08-11 13:29:23,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1120370.0, ans=0.1 2024-08-11 13:29:33,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10600, loss[loss=0.09445, beats_loss=0.01324, ecapa_loss=0.0002205, whisper_loss=0.07901, over 21110.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01123, ecapa_loss=0.0001989, whisper_loss=0.09347, over 3856566.05 frames. ], batch size: 90, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:29:34,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2024-08-11 13:29:40,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1120470.0, ans=0.125 2024-08-11 13:29:42,812 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 13:30:03,823 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 13:30:08,508 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 13:30:43,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1120870.0, ans=0.125 2024-08-11 13:30:50,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10650, loss[loss=0.1008, beats_loss=0.01088, ecapa_loss=0.0002234, whisper_loss=0.08767, over 16177.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01119, ecapa_loss=0.0001986, whisper_loss=0.0932, over 3848026.25 frames. ], batch size: 67, lr: 7.78e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:30:54,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1120970.0, ans=0.125 2024-08-11 13:30:54,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2024-08-11 13:30:57,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.737e+01 3.110e+01 3.500e+01 6.521e+01, threshold=6.221e+01, percent-clipped=1.0 2024-08-11 13:31:08,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2024-08-11 13:31:27,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1121170.0, ans=0.125 2024-08-11 13:32:05,189 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 13:32:07,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1121370.0, ans=0.2 2024-08-11 13:32:10,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10700, loss[loss=0.09492, beats_loss=0.01107, ecapa_loss=0.0001682, whisper_loss=0.08216, over 16129.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01122, ecapa_loss=0.0001993, whisper_loss=0.09314, over 3847337.86 frames. ], batch size: 60, lr: 7.77e-03, grad_scale: 1.152921504606847e+18 2024-08-11 13:32:22,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1121470.0, ans=0.0 2024-08-11 13:32:36,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1121570.0, ans=0.1 2024-08-11 13:32:54,581 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.886e-01 2024-08-11 13:33:01,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1121770.0, ans=0.1 2024-08-11 13:33:09,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1121770.0, ans=0.125 2024-08-11 13:33:11,440 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 13:33:20,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.66 vs. limit=10.0 2024-08-11 13:33:20,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-08-11 13:33:28,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2024-08-11 13:33:31,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10750, loss[loss=0.1321, beats_loss=0.01051, ecapa_loss=0.0001832, whisper_loss=0.1197, over 15962.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0113, ecapa_loss=0.0001982, whisper_loss=0.09339, over 3844733.32 frames. ], batch size: 62, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:33:32,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=17.04 vs. limit=15.0 2024-08-11 13:33:38,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+01 2.776e+01 3.070e+01 3.397e+01 5.449e+01, threshold=6.140e+01, percent-clipped=0.0 2024-08-11 13:33:43,288 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-11 13:33:54,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-11 13:34:20,471 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 13 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 13:34:22,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.43 vs. limit=15.0 2024-08-11 13:34:23,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1122270.0, ans=0.05 2024-08-11 13:34:25,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.15 vs. limit=15.0 2024-08-11 13:34:43,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1122370.0, ans=0.0 2024-08-11 13:34:46,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1122370.0, ans=0.0 2024-08-11 13:34:46,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1122370.0, ans=0.2 2024-08-11 13:34:48,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1122470.0, ans=0.0 2024-08-11 13:34:49,791 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10800, loss[loss=0.08779, beats_loss=0.01446, ecapa_loss=0.0001868, whisper_loss=0.07146, over 18319.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01133, ecapa_loss=0.000198, whisper_loss=0.09414, over 3857563.56 frames. ], batch size: 71, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:35:39,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1122770.0, ans=0.1 2024-08-11 13:35:50,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1122870.0, ans=10.0 2024-08-11 13:36:07,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10850, loss[loss=0.1254, beats_loss=0.01041, ecapa_loss=0.0001976, whisper_loss=0.113, over 17509.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.0113, ecapa_loss=0.000198, whisper_loss=0.09463, over 3867823.18 frames. ], batch size: 70, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:36:15,210 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.852e+01 3.448e+01 4.280e+01 7.389e+01, threshold=6.896e+01, percent-clipped=2.0 2024-08-11 13:36:18,712 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 13:36:39,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.41 vs. limit=22.5 2024-08-11 13:36:42,232 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.710e-01 2024-08-11 13:36:56,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-08-11 13:37:02,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2024-08-11 13:37:21,301 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 13:37:21,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1123370.0, ans=0.125 2024-08-11 13:37:22,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1123370.0, ans=0.125 2024-08-11 13:37:29,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10900, loss[loss=0.1008, beats_loss=0.009882, ecapa_loss=0.0001789, whisper_loss=0.08914, over 21828.00 frames. ], tot_loss[loss=0.108, beats_loss=0.01123, ecapa_loss=0.0001974, whisper_loss=0.09484, over 3869315.48 frames. ], batch size: 86, lr: 7.77e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:37:29,299 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 13:37:49,866 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-11 13:38:07,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1123670.0, ans=0.0 2024-08-11 13:38:13,303 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 25 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-11 13:38:42,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=22.5 2024-08-11 13:38:45,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 10950, loss[loss=0.1273, beats_loss=0.01052, ecapa_loss=0.0001978, whisper_loss=0.1148, over 19487.00 frames. ], tot_loss[loss=0.1081, beats_loss=0.01131, ecapa_loss=0.0001958, whisper_loss=0.09481, over 3921789.06 frames. ], batch size: 75, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:38:52,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1123970.0, ans=0.125 2024-08-11 13:38:53,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+01 2.774e+01 3.085e+01 3.666e+01 6.229e+01, threshold=6.171e+01, percent-clipped=0.0 2024-08-11 13:38:57,237 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 13 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 13:39:08,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1124070.0, ans=0.04949747468305833 2024-08-11 13:39:43,473 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 13:39:50,217 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 13:39:54,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1124370.0, ans=0.09899494936611666 2024-08-11 13:39:57,369 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 13:40:03,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11000, loss[loss=0.1086, beats_loss=0.008428, ecapa_loss=0.0002809, whisper_loss=0.09735, over 21132.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01129, ecapa_loss=0.0001976, whisper_loss=0.09435, over 3942965.33 frames. ], batch size: 91, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:40:04,602 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 33 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 13:40:31,876 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:40:35,329 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 13:40:42,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1124670.0, ans=0.1 2024-08-11 13:40:47,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1124670.0, ans=0.125 2024-08-11 13:40:51,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1124770.0, ans=0.0 2024-08-11 13:40:53,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1124770.0, ans=0.0 2024-08-11 13:40:55,443 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 13:41:07,249 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 13:41:21,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=22.5 2024-08-11 13:41:22,317 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11050, loss[loss=0.09944, beats_loss=0.01229, ecapa_loss=0.0002237, whisper_loss=0.08491, over 17438.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01134, ecapa_loss=0.0001972, whisper_loss=0.09385, over 3944150.65 frames. ], batch size: 70, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:41:29,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+01 2.704e+01 3.049e+01 3.665e+01 6.034e+01, threshold=6.098e+01, percent-clipped=0.0 2024-08-11 13:41:56,800 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 13:42:25,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1125370.0, ans=0.1 2024-08-11 13:42:30,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1125370.0, ans=0.125 2024-08-11 13:42:39,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11100, loss[loss=0.1178, beats_loss=0.007996, ecapa_loss=0.0002204, whisper_loss=0.1076, over 14940.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01139, ecapa_loss=0.0001958, whisper_loss=0.09346, over 3912824.18 frames. ], batch size: 55, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:43:51,225 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 13:44:01,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11150, loss[loss=0.1, beats_loss=0.01138, ecapa_loss=0.0002271, whisper_loss=0.08639, over 21048.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01144, ecapa_loss=0.0001947, whisper_loss=0.09282, over 3903847.73 frames. ], batch size: 92, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:44:03,598 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 13:44:09,627 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.627e+01 3.035e+01 3.415e+01 6.543e+01, threshold=6.070e+01, percent-clipped=1.0 2024-08-11 13:44:19,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1126070.0, ans=0.0 2024-08-11 13:44:40,519 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 13:44:40,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1126170.0, ans=0.0 2024-08-11 13:45:03,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1126370.0, ans=0.125 2024-08-11 13:45:07,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2024-08-11 13:45:18,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11200, loss[loss=0.09667, beats_loss=0.009432, ecapa_loss=0.0002012, whisper_loss=0.08523, over 19372.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01142, ecapa_loss=0.000194, whisper_loss=0.09308, over 3902925.80 frames. ], batch size: 76, lr: 7.76e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:45:18,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1126470.0, ans=0.1 2024-08-11 13:45:21,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1126470.0, ans=0.125 2024-08-11 13:45:25,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1126470.0, ans=0.125 2024-08-11 13:45:30,262 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 13:45:32,173 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 13:45:34,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1126570.0, ans=0.0 2024-08-11 13:45:47,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1126570.0, ans=0.95 2024-08-11 13:46:09,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-11 13:46:41,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1126970.0, ans=0.2 2024-08-11 13:46:42,759 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11250, loss[loss=0.1234, beats_loss=0.01001, ecapa_loss=0.0001814, whisper_loss=0.1116, over 24472.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01133, ecapa_loss=0.0001955, whisper_loss=0.09332, over 3894734.18 frames. ], batch size: 93, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:46:52,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.684e+01 2.944e+01 3.546e+01 6.829e+01, threshold=5.887e+01, percent-clipped=2.0 2024-08-11 13:46:54,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-11 13:46:56,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.81 vs. limit=22.5 2024-08-11 13:46:57,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1126970.0, ans=0.02 2024-08-11 13:47:00,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1127070.0, ans=0.125 2024-08-11 13:47:18,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1127170.0, ans=0.2 2024-08-11 13:47:32,319 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 13:47:41,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2024-08-11 13:47:41,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2024-08-11 13:47:44,127 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-11 13:47:53,168 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-11 13:47:58,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1127370.0, ans=0.0 2024-08-11 13:48:05,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11300, loss[loss=0.08777, beats_loss=0.01389, ecapa_loss=0.0002078, whisper_loss=0.07181, over 21862.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0113, ecapa_loss=0.0001957, whisper_loss=0.09333, over 3880443.43 frames. ], batch size: 93, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:48:10,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2024-08-11 13:48:11,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1127470.0, ans=0.125 2024-08-11 13:48:13,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1127470.0, ans=0.09899494936611666 2024-08-11 13:48:24,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1127570.0, ans=0.125 2024-08-11 13:48:27,921 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 13:48:29,238 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 13:48:40,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1127670.0, ans=0.0 2024-08-11 13:48:58,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-11 13:49:07,143 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 13:49:10,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1127870.0, ans=0.0 2024-08-11 13:49:11,430 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 13:49:17,738 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 13:49:25,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11350, loss[loss=0.08025, beats_loss=0.013, ecapa_loss=0.0001941, whisper_loss=0.06531, over 20357.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01128, ecapa_loss=0.000196, whisper_loss=0.09284, over 3881278.38 frames. ], batch size: 87, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:49:33,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.648e+01 3.083e+01 3.583e+01 5.645e+01, threshold=6.165e+01, percent-clipped=0.0 2024-08-11 13:49:51,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.88 vs. limit=15.0 2024-08-11 13:49:58,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1128170.0, ans=0.0 2024-08-11 13:50:12,637 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 13:50:24,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1128270.0, ans=0.125 2024-08-11 13:50:41,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1128370.0, ans=0.035 2024-08-11 13:50:41,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1128370.0, ans=0.125 2024-08-11 13:50:44,165 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11400, loss[loss=0.09257, beats_loss=0.01323, ecapa_loss=0.0001805, whisper_loss=0.07754, over 22138.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01129, ecapa_loss=0.0001972, whisper_loss=0.09322, over 3917263.18 frames. ], batch size: 91, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:51:03,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1128570.0, ans=0.125 2024-08-11 13:51:14,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1128670.0, ans=0.0 2024-08-11 13:51:48,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1128870.0, ans=0.1 2024-08-11 13:51:52,756 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 13:51:56,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1128870.0, ans=0.2 2024-08-11 13:51:59,164 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11450, loss[loss=0.119, beats_loss=0.009642, ecapa_loss=0.0001956, whisper_loss=0.1074, over 18377.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01128, ecapa_loss=0.0001986, whisper_loss=0.09314, over 3909075.21 frames. ], batch size: 73, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:52:07,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.738e+01 3.140e+01 3.413e+01 5.128e+01, threshold=6.280e+01, percent-clipped=0.0 2024-08-11 13:52:14,299 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 13:52:18,329 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 13:52:25,212 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 13:52:58,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1129270.0, ans=0.2 2024-08-11 13:53:17,919 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11500, loss[loss=0.09688, beats_loss=0.01399, ecapa_loss=0.0001792, whisper_loss=0.08109, over 17869.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01134, ecapa_loss=0.0001974, whisper_loss=0.09315, over 3911650.65 frames. ], batch size: 71, lr: 7.75e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:53:41,490 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 13:53:41,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1129570.0, ans=0.125 2024-08-11 13:53:41,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1129570.0, ans=0.125 2024-08-11 13:53:44,241 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 13:53:48,570 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 13:53:52,964 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-11 13:53:59,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1129670.0, ans=0.0 2024-08-11 13:54:14,224 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 13:54:22,816 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 13:54:35,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11550, loss[loss=0.1018, beats_loss=0.01129, ecapa_loss=0.0001982, whisper_loss=0.08856, over 21745.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0113, ecapa_loss=0.0001978, whisper_loss=0.09325, over 3892896.74 frames. ], batch size: 90, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:54:38,332 INFO [train_multi_KD3.py:844] (0/4) A total of 98 cuts. 29 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 13:54:45,194 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.847e+01 3.236e+01 3.830e+01 5.730e+01, threshold=6.473e+01, percent-clipped=0.0 2024-08-11 13:55:01,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.37 vs. limit=10.0 2024-08-11 13:55:06,614 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 13:55:12,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1130170.0, ans=10.0 2024-08-11 13:55:21,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2024-08-11 13:55:34,154 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 13:55:59,488 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11600, loss[loss=0.08603, beats_loss=0.01313, ecapa_loss=0.0001914, whisper_loss=0.07099, over 18475.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01133, ecapa_loss=0.0001975, whisper_loss=0.09321, over 3882346.78 frames. ], batch size: 75, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:56:01,293 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 13:56:21,536 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 13:56:23,262 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 13:56:30,947 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 13:56:39,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-11 13:56:55,274 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 13:57:05,627 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.670e-01 2024-08-11 13:57:05,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-11 13:57:09,845 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-11 13:57:17,958 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11650, loss[loss=0.1118, beats_loss=0.01093, ecapa_loss=0.0002174, whisper_loss=0.09871, over 17079.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01137, ecapa_loss=0.0001967, whisper_loss=0.09342, over 3907169.00 frames. ], batch size: 67, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:57:22,396 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 13:57:26,696 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.647e+01 2.966e+01 3.476e+01 5.523e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 13:57:32,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=12.0 2024-08-11 13:57:36,006 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 13:57:36,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2024-08-11 13:57:48,407 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-11 13:57:49,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.02 vs. limit=15.0 2024-08-11 13:57:51,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-11 13:57:59,279 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 13:58:03,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-11 13:58:04,813 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 13:58:07,883 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-11 13:58:20,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1131370.0, ans=0.125 2024-08-11 13:58:21,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1131370.0, ans=10.0 2024-08-11 13:58:35,058 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11700, loss[loss=0.1303, beats_loss=0.009053, ecapa_loss=0.0002124, whisper_loss=0.1191, over 19991.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01138, ecapa_loss=0.0001979, whisper_loss=0.09365, over 3931253.32 frames. ], batch size: 77, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 13:58:35,597 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 13:58:41,608 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 13:58:44,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131470.0, ans=0.1 2024-08-11 13:59:03,337 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 13:59:30,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1131770.0, ans=0.0 2024-08-11 13:59:41,950 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 13:59:56,839 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11750, loss[loss=0.08678, beats_loss=0.01357, ecapa_loss=0.0001299, whisper_loss=0.07191, over 16400.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01143, ecapa_loss=0.0001979, whisper_loss=0.09411, over 3955152.53 frames. ], batch size: 64, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:00:04,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+01 2.835e+01 3.323e+01 3.805e+01 1.328e+02, threshold=6.647e+01, percent-clipped=1.0 2024-08-11 14:00:27,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1132070.0, ans=0.1 2024-08-11 14:00:31,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1132170.0, ans=0.2 2024-08-11 14:01:00,955 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.179e+00 2024-08-11 14:01:07,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1132370.0, ans=0.125 2024-08-11 14:01:15,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11800, loss[loss=0.1117, beats_loss=0.009783, ecapa_loss=0.0002608, whisper_loss=0.09934, over 13871.00 frames. ], tot_loss[loss=0.1079, beats_loss=0.01139, ecapa_loss=0.0001969, whisper_loss=0.09459, over 3934750.81 frames. ], batch size: 59, lr: 7.74e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:01:15,850 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 14:01:17,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-11 14:01:22,567 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-11 14:01:42,735 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 14:01:44,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1132670.0, ans=0.2 2024-08-11 14:02:03,094 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 14:02:08,123 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-11 14:02:30,200 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11850, loss[loss=0.1058, beats_loss=0.01157, ecapa_loss=0.0001848, whisper_loss=0.09238, over 22276.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01152, ecapa_loss=0.0001956, whisper_loss=0.09363, over 3937027.81 frames. ], batch size: 90, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:02:34,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1132970.0, ans=0.025 2024-08-11 14:02:38,078 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.696e+01 3.020e+01 3.645e+01 5.662e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 14:02:38,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1132970.0, ans=0.0 2024-08-11 14:02:46,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1133070.0, ans=0.0 2024-08-11 14:02:49,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1133070.0, ans=0.1 2024-08-11 14:03:08,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1133170.0, ans=0.125 2024-08-11 14:03:20,458 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 18 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-11 14:03:45,013 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 14:03:46,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11900, loss[loss=0.1194, beats_loss=0.01102, ecapa_loss=0.0001794, whisper_loss=0.1066, over 23626.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01141, ecapa_loss=0.0001963, whisper_loss=0.09428, over 3961018.90 frames. ], batch size: 92, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:03:57,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1133470.0, ans=0.2 2024-08-11 14:04:02,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1133570.0, ans=0.2 2024-08-11 14:04:03,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1133570.0, ans=0.125 2024-08-11 14:04:06,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1133570.0, ans=0.2 2024-08-11 14:04:14,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1133570.0, ans=0.0 2024-08-11 14:04:30,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1133670.0, ans=0.2 2024-08-11 14:04:52,408 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 14:04:53,965 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 14:04:58,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1133870.0, ans=0.2 2024-08-11 14:05:04,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 11950, loss[loss=0.09146, beats_loss=0.01306, ecapa_loss=0.0001974, whisper_loss=0.07643, over 19709.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0114, ecapa_loss=0.0001971, whisper_loss=0.09398, over 3921156.13 frames. ], batch size: 83, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:05:07,022 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 14:05:12,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.574e+01 2.891e+01 3.292e+01 6.091e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-11 14:05:14,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1133970.0, ans=0.125 2024-08-11 14:05:42,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1134170.0, ans=0.2 2024-08-11 14:06:06,908 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 14:06:15,919 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 14:06:24,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12000, loss[loss=0.1297, beats_loss=0.01024, ecapa_loss=0.0001914, whisper_loss=0.1176, over 23525.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01148, ecapa_loss=0.0001965, whisper_loss=0.09338, over 3914247.81 frames. ], batch size: 94, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:06:24,372 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 14:07:03,247 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006428, whisper_loss=0.2514, over 922467.00 frames. 2024-08-11 14:07:22,419 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on SV_voxceleb1: loss=0.005208, beats_loss=0, ecapa_loss=0.0005208, whisper_loss=0, over 939242.00 frames. 2024-08-11 14:09:12,729 INFO [train_multi_KD3.py:1149] (0/4) Epoch 8, validation on AT_audioset: loss=0.02509, beats_loss=0.02509, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 14:09:12,733 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 14:09:19,465 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-11 14:09:23,999 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 14:09:50,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1134670.0, ans=0.0 2024-08-11 14:09:52,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1134670.0, ans=15.0 2024-08-11 14:09:56,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1134770.0, ans=0.95 2024-08-11 14:10:18,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1134870.0, ans=0.125 2024-08-11 14:10:18,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1134870.0, ans=0.0 2024-08-11 14:10:26,308 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12050, loss[loss=0.1016, beats_loss=0.01513, ecapa_loss=0.0001824, whisper_loss=0.08467, over 21994.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01156, ecapa_loss=0.0001952, whisper_loss=0.09277, over 3909790.23 frames. ], batch size: 91, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:10:32,165 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 14:10:34,783 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.739e+01 2.961e+01 3.556e+01 5.317e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-11 14:10:41,044 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 14:10:51,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1135070.0, ans=0.125 2024-08-11 14:10:53,649 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-11 14:11:00,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1135170.0, ans=0.125 2024-08-11 14:11:06,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-08-11 14:11:08,176 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 14:11:17,545 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-11 14:11:33,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1135370.0, ans=0.2 2024-08-11 14:11:38,002 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 14:11:41,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1135470.0, ans=0.125 2024-08-11 14:11:42,337 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12100, loss[loss=0.1052, beats_loss=0.01301, ecapa_loss=0.000156, whisper_loss=0.0906, over 23338.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01147, ecapa_loss=0.0001949, whisper_loss=0.09313, over 3879748.51 frames. ], batch size: 91, lr: 7.73e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:11:42,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1135470.0, ans=0.0 2024-08-11 14:11:59,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1135570.0, ans=0.125 2024-08-11 14:12:01,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1135570.0, ans=0.0 2024-08-11 14:12:10,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.00 vs. limit=22.5 2024-08-11 14:12:11,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-08-11 14:12:11,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-08-11 14:12:23,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1135770.0, ans=0.125 2024-08-11 14:12:35,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=15.0 2024-08-11 14:12:49,472 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 14:12:52,132 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12150, loss[loss=0.1076, beats_loss=0.008927, ecapa_loss=0.0002038, whisper_loss=0.09665, over 17477.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01139, ecapa_loss=0.000195, whisper_loss=0.09354, over 3870108.34 frames. ], batch size: 69, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:12:52,358 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 14:12:56,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1135970.0, ans=0.035 2024-08-11 14:12:59,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.505e+01 2.843e+01 3.166e+01 1.229e+02, threshold=5.686e+01, percent-clipped=1.0 2024-08-11 14:13:02,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1135970.0, ans=0.125 2024-08-11 14:13:03,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1135970.0, ans=0.1 2024-08-11 14:13:06,227 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-11 14:13:34,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1136270.0, ans=0.125 2024-08-11 14:13:42,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1136270.0, ans=0.125 2024-08-11 14:13:48,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1136370.0, ans=0.125 2024-08-11 14:13:52,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-11 14:14:00,463 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12200, loss[loss=0.1027, beats_loss=0.01324, ecapa_loss=0.0001717, whisper_loss=0.08774, over 23757.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01149, ecapa_loss=0.0001938, whisper_loss=0.09278, over 3856906.70 frames. ], batch size: 92, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:14:01,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2024-08-11 14:14:11,933 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 14:14:30,942 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 14:14:33,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1136670.0, ans=0.07 2024-08-11 14:14:34,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1136670.0, ans=0.5 2024-08-11 14:14:43,122 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 14:14:48,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1136770.0, ans=0.125 2024-08-11 14:14:59,570 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 14:15:09,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12250, loss[loss=0.1136, beats_loss=0.01052, ecapa_loss=0.0001958, whisper_loss=0.1011, over 14205.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01147, ecapa_loss=0.000193, whisper_loss=0.09318, over 3872950.51 frames. ], batch size: 55, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:15:16,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.710e+01 3.098e+01 3.529e+01 5.582e+01, threshold=6.197e+01, percent-clipped=0.0 2024-08-11 14:15:17,906 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-11 14:15:38,589 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-11 14:15:43,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=12.0 2024-08-11 14:15:47,927 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 14:15:50,809 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-11 14:15:56,998 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 14:16:01,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1137270.0, ans=0.125 2024-08-11 14:16:07,982 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 14:16:19,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12300, loss[loss=0.1004, beats_loss=0.01141, ecapa_loss=0.0002226, whisper_loss=0.0868, over 19691.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01141, ecapa_loss=0.0001949, whisper_loss=0.09337, over 3902003.44 frames. ], batch size: 81, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:16:31,646 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.426e-02 2024-08-11 14:16:44,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1137570.0, ans=0.0 2024-08-11 14:16:45,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1137670.0, ans=0.0 2024-08-11 14:16:45,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1137670.0, ans=0.125 2024-08-11 14:17:28,814 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12350, loss[loss=0.1039, beats_loss=0.01115, ecapa_loss=0.0002125, whisper_loss=0.0906, over 21971.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01137, ecapa_loss=0.0001963, whisper_loss=0.09302, over 3923915.63 frames. ], batch size: 87, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:17:32,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137970.0, ans=0.1 2024-08-11 14:17:36,214 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.078e+01 2.771e+01 3.079e+01 3.408e+01 5.279e+01, threshold=6.158e+01, percent-clipped=0.0 2024-08-11 14:17:41,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1137970.0, ans=0.2 2024-08-11 14:17:44,440 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 14:17:44,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1138070.0, ans=0.125 2024-08-11 14:17:51,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-08-11 14:18:07,510 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 14:18:17,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1138270.0, ans=0.1 2024-08-11 14:18:22,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1138270.0, ans=0.125 2024-08-11 14:18:23,410 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 14:18:24,829 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 14:18:25,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1138270.0, ans=0.125 2024-08-11 14:18:28,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1138370.0, ans=0.125 2024-08-11 14:18:33,991 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-11 14:18:36,964 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 14:18:41,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12400, loss[loss=0.0808, beats_loss=0.01446, ecapa_loss=0.0001426, whisper_loss=0.06491, over 21760.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01136, ecapa_loss=0.0001966, whisper_loss=0.09295, over 3905362.85 frames. ], batch size: 89, lr: 7.72e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:18:52,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=1138470.0, ans=15.0 2024-08-11 14:18:56,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1138570.0, ans=0.0 2024-08-11 14:19:00,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1138570.0, ans=0.04949747468305833 2024-08-11 14:19:03,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2024-08-11 14:19:13,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=12.0 2024-08-11 14:19:34,026 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-11 14:19:45,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1138870.0, ans=0.125 2024-08-11 14:19:48,088 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-11 14:19:50,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-08-11 14:19:51,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12450, loss[loss=0.09863, beats_loss=0.0106, ecapa_loss=0.0001974, whisper_loss=0.08605, over 17143.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01136, ecapa_loss=0.0001963, whisper_loss=0.0923, over 3910294.76 frames. ], batch size: 67, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:19:54,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-11 14:19:59,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+01 2.783e+01 3.134e+01 3.561e+01 9.376e+01, threshold=6.268e+01, percent-clipped=1.0 2024-08-11 14:20:07,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1139070.0, ans=0.0 2024-08-11 14:20:13,974 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 14:20:25,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1139170.0, ans=0.1 2024-08-11 14:20:38,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-11 14:20:41,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2024-08-11 14:20:43,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1139270.0, ans=0.125 2024-08-11 14:20:57,631 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=8.533e-01 2024-08-11 14:21:04,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12500, loss[loss=0.08121, beats_loss=0.01388, ecapa_loss=0.0001914, whisper_loss=0.06542, over 16918.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01139, ecapa_loss=0.0001967, whisper_loss=0.09237, over 3888781.55 frames. ], batch size: 72, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:21:06,900 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 14:21:29,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1139570.0, ans=0.0 2024-08-11 14:21:29,978 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 14:21:39,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1139670.0, ans=0.1 2024-08-11 14:21:47,732 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 14:21:53,193 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 14:22:20,546 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12550, loss[loss=0.1246, beats_loss=0.00874, ecapa_loss=0.000225, whisper_loss=0.1136, over 23156.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01135, ecapa_loss=0.0001955, whisper_loss=0.09321, over 3915408.13 frames. ], batch size: 91, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:22:27,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.780e+01 3.157e+01 3.733e+01 7.024e+01, threshold=6.315e+01, percent-clipped=2.0 2024-08-11 14:22:29,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1139970.0, ans=0.1 2024-08-11 14:22:39,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1140070.0, ans=0.0 2024-08-11 14:22:39,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1140070.0, ans=0.125 2024-08-11 14:22:48,997 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 14:22:51,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1140170.0, ans=0.125 2024-08-11 14:22:51,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1140170.0, ans=0.125 2024-08-11 14:23:16,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1140270.0, ans=0.0 2024-08-11 14:23:21,792 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 14:23:29,876 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 14:23:34,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12600, loss[loss=0.08631, beats_loss=0.01237, ecapa_loss=0.0002458, whisper_loss=0.07149, over 21153.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01136, ecapa_loss=0.0001981, whisper_loss=0.09343, over 3924333.22 frames. ], batch size: 92, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:23:41,659 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 14:23:49,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1140570.0, ans=0.0 2024-08-11 14:24:00,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1140570.0, ans=0.125 2024-08-11 14:24:09,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1140670.0, ans=0.0 2024-08-11 14:24:14,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-08-11 14:24:48,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12650, loss[loss=0.09633, beats_loss=0.01455, ecapa_loss=0.0001938, whisper_loss=0.07983, over 16010.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.0114, ecapa_loss=0.0001988, whisper_loss=0.09383, over 3928492.06 frames. ], batch size: 66, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:24:54,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1140970.0, ans=0.0 2024-08-11 14:24:55,232 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.818e+01 3.225e+01 3.809e+01 6.974e+01, threshold=6.451e+01, percent-clipped=1.0 2024-08-11 14:24:57,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1140970.0, ans=0.1 2024-08-11 14:24:59,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1140970.0, ans=0.0 2024-08-11 14:25:21,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1141170.0, ans=0.09899494936611666 2024-08-11 14:25:22,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1141170.0, ans=0.125 2024-08-11 14:25:44,258 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 12 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 14:25:48,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1141370.0, ans=0.0 2024-08-11 14:26:00,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12700, loss[loss=0.1158, beats_loss=0.01071, ecapa_loss=0.0001512, whisper_loss=0.1036, over 19785.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01146, ecapa_loss=0.0001979, whisper_loss=0.09339, over 3895909.88 frames. ], batch size: 74, lr: 7.71e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:26:21,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1141570.0, ans=0.125 2024-08-11 14:26:32,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-08-11 14:26:41,025 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.661e-01 2024-08-11 14:26:44,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1141770.0, ans=0.125 2024-08-11 14:26:55,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1141870.0, ans=0.0 2024-08-11 14:27:01,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1141870.0, ans=0.0 2024-08-11 14:27:10,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12750, loss[loss=0.113, beats_loss=0.01141, ecapa_loss=0.0001811, whisper_loss=0.09976, over 21331.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01149, ecapa_loss=0.0001973, whisper_loss=0.09354, over 3919227.95 frames. ], batch size: 83, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:27:17,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.661e+01 2.986e+01 3.443e+01 7.051e+01, threshold=5.972e+01, percent-clipped=1.0 2024-08-11 14:27:20,594 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 14:27:23,645 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 14:27:26,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1142070.0, ans=0.0 2024-08-11 14:27:36,066 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-11 14:27:36,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1142070.0, ans=0.125 2024-08-11 14:27:41,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1142170.0, ans=0.125 2024-08-11 14:27:44,390 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 14:27:53,005 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 31 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-11 14:27:53,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1142270.0, ans=0.2 2024-08-11 14:28:02,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-11 14:28:09,684 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 14:28:12,313 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 14:28:18,108 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-11 14:28:18,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1142370.0, ans=0.125 2024-08-11 14:28:20,506 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12800, loss[loss=0.09359, beats_loss=0.009874, ecapa_loss=0.0001856, whisper_loss=0.08186, over 16494.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01141, ecapa_loss=0.000199, whisper_loss=0.09343, over 3898293.90 frames. ], batch size: 64, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:28:23,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1142470.0, ans=0.1 2024-08-11 14:28:45,911 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-11 14:28:49,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2024-08-11 14:28:55,174 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 14:28:55,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1142670.0, ans=0.1 2024-08-11 14:28:59,423 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 14:29:31,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12850, loss[loss=0.1177, beats_loss=0.01172, ecapa_loss=0.0002243, whisper_loss=0.1038, over 21807.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01143, ecapa_loss=0.0001996, whisper_loss=0.09263, over 3842232.45 frames. ], batch size: 90, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:29:36,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-11 14:29:38,557 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+01 2.679e+01 2.923e+01 3.402e+01 6.033e+01, threshold=5.846e+01, percent-clipped=2.0 2024-08-11 14:29:43,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-11 14:29:50,371 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 15 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 14:30:00,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1143170.0, ans=0.0 2024-08-11 14:30:07,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1143170.0, ans=0.125 2024-08-11 14:30:12,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=15.0 2024-08-11 14:30:24,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1143270.0, ans=0.2 2024-08-11 14:30:34,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2024-08-11 14:30:40,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12900, loss[loss=0.09792, beats_loss=0.01134, ecapa_loss=0.0001761, whisper_loss=0.08482, over 18220.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01144, ecapa_loss=0.0001985, whisper_loss=0.09232, over 3824422.30 frames. ], batch size: 73, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:30:40,999 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-11 14:30:49,052 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 14:30:57,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=22.5 2024-08-11 14:31:05,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1143570.0, ans=0.0 2024-08-11 14:31:08,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1143670.0, ans=0.125 2024-08-11 14:31:11,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.68 vs. limit=22.5 2024-08-11 14:31:37,674 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.717e+02 2024-08-11 14:31:48,356 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 12950, loss[loss=0.1168, beats_loss=0.01197, ecapa_loss=0.000134, whisper_loss=0.1035, over 17519.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01143, ecapa_loss=0.0001963, whisper_loss=0.09269, over 3853669.51 frames. ], batch size: 67, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:31:48,598 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 14:31:51,295 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-11 14:31:51,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1143970.0, ans=0.0 2024-08-11 14:31:54,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.619e+01 2.896e+01 3.261e+01 4.562e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-11 14:31:58,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1143970.0, ans=0.1 2024-08-11 14:32:07,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-08-11 14:32:08,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1144070.0, ans=0.0 2024-08-11 14:32:09,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1144070.0, ans=0.1 2024-08-11 14:32:28,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1144270.0, ans=0.1 2024-08-11 14:32:34,509 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-11 14:32:44,497 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-11 14:32:51,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=8.0 2024-08-11 14:32:55,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13000, loss[loss=0.08935, beats_loss=0.01482, ecapa_loss=0.0001821, whisper_loss=0.07271, over 22011.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01129, ecapa_loss=0.0001969, whisper_loss=0.09401, over 3930293.67 frames. ], batch size: 94, lr: 7.70e-03, grad_scale: 1.152921504606847e+18 2024-08-11 14:32:58,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-08-11 14:33:03,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1144470.0, ans=0.0 2024-08-11 14:33:07,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=12.0 2024-08-11 14:33:08,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1144570.0, ans=0.0 2024-08-11 14:33:12,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-11 14:33:18,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1144570.0, ans=0.125 2024-08-11 14:33:23,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1144670.0, ans=0.0 2024-08-11 14:33:30,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1144670.0, ans=0.125 2024-08-11 14:33:48,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1144870.0, ans=0.125 2024-08-11 14:33:49,816 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.059e+05 2024-08-11 14:33:57,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1144870.0, ans=0.1 2024-08-11 14:34:01,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13050, loss[loss=0.1099, beats_loss=0.01292, ecapa_loss=0.0001762, whisper_loss=0.09521, over 22624.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01124, ecapa_loss=0.0001979, whisper_loss=0.09387, over 3915120.40 frames. ], batch size: 92, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:34:09,121 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.257e+01 2.663e+01 3.009e+01 3.543e+01 5.736e+01, threshold=6.018e+01, percent-clipped=0.0 2024-08-11 14:34:16,149 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 14:34:19,021 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 14:34:19,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1145070.0, ans=0.125 2024-08-11 14:34:23,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1145070.0, ans=0.125 2024-08-11 14:34:25,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-11 14:34:43,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1145270.0, ans=0.0 2024-08-11 14:34:47,116 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 14:35:05,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1145370.0, ans=0.125 2024-08-11 14:35:08,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13100, loss[loss=0.1079, beats_loss=0.01094, ecapa_loss=0.0001511, whisper_loss=0.0955, over 20745.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01124, ecapa_loss=0.0001969, whisper_loss=0.09351, over 3912854.39 frames. ], batch size: 78, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:35:48,146 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 14:36:03,221 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 14:36:07,217 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 14:36:08,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1145870.0, ans=0.125 2024-08-11 14:36:09,636 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 14:36:12,363 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 14:36:16,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13150, loss[loss=0.0766, beats_loss=0.01358, ecapa_loss=0.0002069, whisper_loss=0.06096, over 16736.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01139, ecapa_loss=0.0001957, whisper_loss=0.0929, over 3934574.70 frames. ], batch size: 69, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:36:17,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1145970.0, ans=0.02 2024-08-11 14:36:24,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.646e+01 3.074e+01 3.551e+01 7.415e+01, threshold=6.148e+01, percent-clipped=1.0 2024-08-11 14:36:24,642 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 14:36:33,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1146070.0, ans=0.1 2024-08-11 14:37:01,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1146270.0, ans=0.1 2024-08-11 14:37:04,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1146270.0, ans=0.0 2024-08-11 14:37:06,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1146270.0, ans=0.0 2024-08-11 14:37:06,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1146270.0, ans=0.125 2024-08-11 14:37:06,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-11 14:37:13,225 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 14:37:14,517 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 14:37:25,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13200, loss[loss=0.1159, beats_loss=0.01094, ecapa_loss=0.0001938, whisper_loss=0.1031, over 22234.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01138, ecapa_loss=0.0001961, whisper_loss=0.09315, over 3953181.84 frames. ], batch size: 90, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:37:29,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1146470.0, ans=0.125 2024-08-11 14:37:32,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-08-11 14:37:33,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1146470.0, ans=0.0 2024-08-11 14:37:41,405 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 14:37:43,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2024-08-11 14:37:44,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1146570.0, ans=0.1 2024-08-11 14:38:01,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1146670.0, ans=0.125 2024-08-11 14:38:30,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1146970.0, ans=0.2 2024-08-11 14:38:31,569 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13250, loss[loss=0.1171, beats_loss=0.01203, ecapa_loss=0.0001797, whisper_loss=0.1033, over 21851.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01138, ecapa_loss=0.0001941, whisper_loss=0.09347, over 3925659.96 frames. ], batch size: 84, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:38:33,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.73 vs. limit=22.5 2024-08-11 14:38:39,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.719e+01 3.002e+01 3.497e+01 5.724e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 14:38:41,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1146970.0, ans=0.1 2024-08-11 14:38:42,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1146970.0, ans=0.0 2024-08-11 14:38:48,169 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 14:38:49,464 INFO [train_multi_KD3.py:844] (0/4) A total of 98 cuts. 28 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-11 14:39:12,392 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 14:39:24,574 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.783e-02 2024-08-11 14:39:38,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13300, loss[loss=0.1068, beats_loss=0.01319, ecapa_loss=0.0001556, whisper_loss=0.092, over 22990.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0113, ecapa_loss=0.0001956, whisper_loss=0.09297, over 3896053.59 frames. ], batch size: 92, lr: 7.69e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:39:43,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1147470.0, ans=0.125 2024-08-11 14:40:09,485 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 14:40:21,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1147770.0, ans=0.125 2024-08-11 14:40:28,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-08-11 14:40:37,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2024-08-11 14:40:38,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1147870.0, ans=0.1 2024-08-11 14:40:44,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13350, loss[loss=0.1449, beats_loss=0.007079, ecapa_loss=0.0001936, whisper_loss=0.1358, over 15899.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01142, ecapa_loss=0.0001936, whisper_loss=0.09273, over 3904251.84 frames. ], batch size: 60, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:40:50,313 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 14:40:52,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1147970.0, ans=0.035 2024-08-11 14:40:53,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.881e+01 3.191e+01 3.673e+01 5.435e+01, threshold=6.381e+01, percent-clipped=0.0 2024-08-11 14:40:55,999 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 14:41:12,637 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 14:41:12,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1148170.0, ans=0.125 2024-08-11 14:41:16,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1148170.0, ans=0.125 2024-08-11 14:41:41,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1148370.0, ans=12.0 2024-08-11 14:41:50,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1148370.0, ans=0.125 2024-08-11 14:41:52,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13400, loss[loss=0.1056, beats_loss=0.01115, ecapa_loss=0.0002066, whisper_loss=0.09242, over 22714.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0001951, whisper_loss=0.09288, over 3905913.01 frames. ], batch size: 90, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:42:02,010 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-11 14:42:20,649 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-11 14:42:21,964 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-11 14:42:35,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1148770.0, ans=0.125 2024-08-11 14:42:41,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1148770.0, ans=0.1 2024-08-11 14:42:43,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1148770.0, ans=0.0 2024-08-11 14:42:45,819 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 14:42:46,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.5 2024-08-11 14:42:51,111 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 14:42:54,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1148870.0, ans=0.125 2024-08-11 14:42:59,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13450, loss[loss=0.09668, beats_loss=0.01266, ecapa_loss=0.0001333, whisper_loss=0.0827, over 22340.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01142, ecapa_loss=0.0001952, whisper_loss=0.09259, over 3916395.50 frames. ], batch size: 85, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:42:59,452 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-11 14:43:07,004 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.176e+01 2.672e+01 2.998e+01 3.496e+01 5.811e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 14:43:16,923 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 14:43:31,439 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 14:43:38,631 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 14:43:38,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1149270.0, ans=0.1 2024-08-11 14:43:42,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1149270.0, ans=0.1 2024-08-11 14:43:49,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1149270.0, ans=0.125 2024-08-11 14:43:50,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1149270.0, ans=0.125 2024-08-11 14:43:52,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1149370.0, ans=0.0 2024-08-11 14:43:59,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1149370.0, ans=0.1 2024-08-11 14:44:01,864 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 14:44:02,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1149370.0, ans=0.1 2024-08-11 14:44:06,862 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13500, loss[loss=0.09793, beats_loss=0.01652, ecapa_loss=0.0001881, whisper_loss=0.07953, over 20239.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01148, ecapa_loss=0.0001963, whisper_loss=0.09204, over 3878122.57 frames. ], batch size: 86, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:44:16,565 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 14:44:21,074 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 14:44:21,395 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.562e-02 2024-08-11 14:44:35,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1149670.0, ans=0.125 2024-08-11 14:44:56,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1149770.0, ans=0.0 2024-08-11 14:45:02,958 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 14:45:04,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1149870.0, ans=0.125 2024-08-11 14:45:07,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1149870.0, ans=0.2 2024-08-11 14:45:10,986 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 14:45:13,749 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13550, loss[loss=0.09004, beats_loss=0.01442, ecapa_loss=0.0001945, whisper_loss=0.07367, over 22396.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01153, ecapa_loss=0.0001965, whisper_loss=0.09183, over 3866405.99 frames. ], batch size: 95, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:45:14,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-11 14:45:18,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1149970.0, ans=0.125 2024-08-11 14:45:22,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.724e+01 3.026e+01 3.356e+01 6.368e+01, threshold=6.052e+01, percent-clipped=1.0 2024-08-11 14:45:45,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1150170.0, ans=0.2 2024-08-11 14:45:56,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-08-11 14:45:59,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1150270.0, ans=0.1 2024-08-11 14:46:11,412 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 14:46:18,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-11 14:46:20,787 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13600, loss[loss=0.109, beats_loss=0.01231, ecapa_loss=0.0001881, whisper_loss=0.09481, over 22453.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01158, ecapa_loss=0.0001949, whisper_loss=0.09128, over 3876428.05 frames. ], batch size: 91, lr: 7.68e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:46:27,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1150470.0, ans=0.0 2024-08-11 14:46:28,725 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 14:46:31,677 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 14:47:21,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-11 14:47:26,207 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 14:47:26,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.57 vs. limit=22.5 2024-08-11 14:47:27,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13650, loss[loss=0.1036, beats_loss=0.01128, ecapa_loss=0.0001931, whisper_loss=0.09042, over 17033.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01156, ecapa_loss=0.0001951, whisper_loss=0.09116, over 3859937.84 frames. ], batch size: 66, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:47:32,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1150970.0, ans=0.0 2024-08-11 14:47:34,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.952e+01 3.395e+01 3.813e+01 5.359e+01, threshold=6.790e+01, percent-clipped=0.0 2024-08-11 14:47:35,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-11 14:47:36,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1150970.0, ans=0.125 2024-08-11 14:47:37,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1150970.0, ans=0.0 2024-08-11 14:47:43,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=15.0 2024-08-11 14:47:49,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1151070.0, ans=0.125 2024-08-11 14:47:58,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1151170.0, ans=0.125 2024-08-11 14:47:58,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1151170.0, ans=0.125 2024-08-11 14:48:05,017 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-11 14:48:07,783 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 14:48:08,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1151270.0, ans=0.125 2024-08-11 14:48:26,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1151370.0, ans=0.0 2024-08-11 14:48:30,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1151370.0, ans=0.125 2024-08-11 14:48:31,736 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 14:48:34,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13700, loss[loss=0.1057, beats_loss=0.01052, ecapa_loss=0.0002209, whisper_loss=0.09296, over 18002.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0115, ecapa_loss=0.0001957, whisper_loss=0.09222, over 3902986.53 frames. ], batch size: 73, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:48:43,022 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 14:48:51,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1151570.0, ans=0.125 2024-08-11 14:49:00,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1151670.0, ans=0.125 2024-08-11 14:49:16,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1151770.0, ans=0.1 2024-08-11 14:49:24,686 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 14:49:27,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1151870.0, ans=0.0 2024-08-11 14:49:27,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1151870.0, ans=0.0 2024-08-11 14:49:41,194 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13750, loss[loss=0.1005, beats_loss=0.01291, ecapa_loss=0.0001618, whisper_loss=0.08598, over 20371.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01147, ecapa_loss=0.0001953, whisper_loss=0.09264, over 3902832.13 frames. ], batch size: 82, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:49:49,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.564e+01 2.884e+01 3.394e+01 1.263e+02, threshold=5.769e+01, percent-clipped=1.0 2024-08-11 14:49:53,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-11 14:50:06,304 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-11 14:50:42,159 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 14:50:47,441 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-11 14:50:48,444 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13800, loss[loss=0.107, beats_loss=0.00839, ecapa_loss=0.0002403, whisper_loss=0.09617, over 23184.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01133, ecapa_loss=0.0001955, whisper_loss=0.09315, over 3878606.63 frames. ], batch size: 94, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:50:56,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-08-11 14:51:04,815 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 14:51:08,863 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 14:51:09,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1152570.0, ans=0.1 2024-08-11 14:51:31,486 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 14:51:33,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1152770.0, ans=0.025 2024-08-11 14:51:36,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1152770.0, ans=0.125 2024-08-11 14:51:37,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=12.0 2024-08-11 14:51:39,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1152770.0, ans=0.0 2024-08-11 14:51:40,725 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 14:51:43,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=12.0 2024-08-11 14:51:45,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-11 14:51:55,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13850, loss[loss=0.1147, beats_loss=0.01105, ecapa_loss=0.0001894, whisper_loss=0.1018, over 22284.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01123, ecapa_loss=0.0001977, whisper_loss=0.09364, over 3887891.56 frames. ], batch size: 88, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:51:55,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-08-11 14:51:56,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1152970.0, ans=0.0 2024-08-11 14:52:02,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1152970.0, ans=0.1 2024-08-11 14:52:03,069 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.659e+01 3.124e+01 3.574e+01 6.862e+01, threshold=6.248e+01, percent-clipped=1.0 2024-08-11 14:52:04,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1152970.0, ans=0.1 2024-08-11 14:52:24,380 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 14:52:29,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.62 vs. limit=22.5 2024-08-11 14:52:31,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1153170.0, ans=0.1 2024-08-11 14:52:37,258 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 14:52:49,754 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 14:53:01,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13900, loss[loss=0.1013, beats_loss=0.01354, ecapa_loss=0.0001949, whisper_loss=0.08579, over 15430.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0112, ecapa_loss=0.0001965, whisper_loss=0.09376, over 3898498.28 frames. ], batch size: 64, lr: 7.67e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:53:01,801 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 14:53:21,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1153570.0, ans=0.0 2024-08-11 14:53:22,831 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-11 14:53:40,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1153770.0, ans=0.2 2024-08-11 14:53:47,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1153770.0, ans=0.125 2024-08-11 14:54:07,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 13950, loss[loss=0.1063, beats_loss=0.01132, ecapa_loss=0.0001885, whisper_loss=0.09307, over 20652.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01113, ecapa_loss=0.0001971, whisper_loss=0.09417, over 3889145.91 frames. ], batch size: 82, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:54:08,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1153970.0, ans=0.0 2024-08-11 14:54:15,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.781e+01 3.096e+01 3.577e+01 5.485e+01, threshold=6.193e+01, percent-clipped=0.0 2024-08-11 14:54:16,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1153970.0, ans=0.2 2024-08-11 14:54:18,586 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 14:54:21,105 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-11 14:54:25,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-08-11 14:54:26,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1154070.0, ans=0.0 2024-08-11 14:54:35,736 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 11 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 14:54:36,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1154170.0, ans=0.125 2024-08-11 14:54:37,074 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-11 14:54:42,691 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 14:54:49,911 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-11 14:54:57,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1154270.0, ans=0.1 2024-08-11 14:54:58,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=1154270.0, ans=0.2 2024-08-11 14:55:02,225 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 14:55:13,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2024-08-11 14:55:13,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2024-08-11 14:55:16,119 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14000, loss[loss=0.1209, beats_loss=0.009746, ecapa_loss=0.0002144, whisper_loss=0.109, over 20188.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01109, ecapa_loss=0.0001977, whisper_loss=0.09449, over 3869357.35 frames. ], batch size: 81, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:55:18,032 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 14:55:23,185 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 14:55:24,821 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 14:55:26,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1154470.0, ans=0.125 2024-08-11 14:55:33,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1154570.0, ans=0.0 2024-08-11 14:55:35,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1154570.0, ans=0.125 2024-08-11 14:55:50,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1154670.0, ans=0.2 2024-08-11 14:55:54,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1154670.0, ans=0.125 2024-08-11 14:55:58,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1154770.0, ans=0.2 2024-08-11 14:56:09,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-11 14:56:16,070 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 14:56:26,002 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 14:56:27,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14050, loss[loss=0.1224, beats_loss=0.01048, ecapa_loss=0.000161, whisper_loss=0.1103, over 22691.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01109, ecapa_loss=0.0001977, whisper_loss=0.0945, over 3894379.23 frames. ], batch size: 87, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:56:36,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.747e+01 3.034e+01 3.556e+01 6.486e+01, threshold=6.067e+01, percent-clipped=1.0 2024-08-11 14:56:44,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1155070.0, ans=0.125 2024-08-11 14:56:52,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1155070.0, ans=0.2 2024-08-11 14:56:52,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1155070.0, ans=0.1 2024-08-11 14:57:15,667 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 14:57:36,633 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 14:57:39,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1155370.0, ans=0.025 2024-08-11 14:57:43,621 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14100, loss[loss=0.09037, beats_loss=0.01103, ecapa_loss=0.0002054, whisper_loss=0.07729, over 14123.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01116, ecapa_loss=0.0001965, whisper_loss=0.09422, over 3853428.44 frames. ], batch size: 60, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:57:44,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-11 14:57:52,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2024-08-11 14:58:28,193 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 14:58:38,209 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-11 14:58:47,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1155870.0, ans=0.0 2024-08-11 14:58:53,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1155870.0, ans=0.0 2024-08-11 14:58:59,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14150, loss[loss=0.09755, beats_loss=0.01134, ecapa_loss=0.0002075, whisper_loss=0.08413, over 22028.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01121, ecapa_loss=0.0001956, whisper_loss=0.09368, over 3869101.27 frames. ], batch size: 92, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 14:59:01,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1155970.0, ans=0.0 2024-08-11 14:59:08,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.682e+01 3.045e+01 3.525e+01 6.405e+01, threshold=6.090e+01, percent-clipped=1.0 2024-08-11 14:59:33,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1156170.0, ans=0.125 2024-08-11 14:59:33,986 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.177e-01 2024-08-11 14:59:38,474 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-11 14:59:43,824 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 14:59:45,297 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-11 14:59:47,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1156270.0, ans=0.1 2024-08-11 14:59:54,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1156270.0, ans=0.125 2024-08-11 14:59:58,745 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 15:00:13,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1156370.0, ans=0.125 2024-08-11 15:00:17,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14200, loss[loss=0.1327, beats_loss=0.01101, ecapa_loss=0.0001689, whisper_loss=0.12, over 22968.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01122, ecapa_loss=0.0001968, whisper_loss=0.09319, over 3849645.27 frames. ], batch size: 88, lr: 7.66e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:00:32,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1156570.0, ans=0.1 2024-08-11 15:00:34,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-11 15:00:46,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1156670.0, ans=0.0 2024-08-11 15:00:52,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1156670.0, ans=0.0 2024-08-11 15:00:56,381 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 15:01:17,941 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 15:01:28,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-08-11 15:01:32,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14250, loss[loss=0.06661, beats_loss=0.01413, ecapa_loss=0.0001706, whisper_loss=0.05077, over 13974.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01121, ecapa_loss=0.0001966, whisper_loss=0.0939, over 3855484.58 frames. ], batch size: 57, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:01:43,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.820e+01 3.214e+01 3.813e+01 8.671e+01, threshold=6.428e+01, percent-clipped=3.0 2024-08-11 15:01:47,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1156970.0, ans=0.0 2024-08-11 15:01:52,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1157070.0, ans=0.0 2024-08-11 15:01:54,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=15.0 2024-08-11 15:01:55,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1157070.0, ans=0.07 2024-08-11 15:01:59,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1157070.0, ans=0.1 2024-08-11 15:02:05,721 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-11 15:02:09,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1157170.0, ans=0.0 2024-08-11 15:02:21,831 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 15:02:39,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1157370.0, ans=0.09899494936611666 2024-08-11 15:02:43,776 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 15:02:51,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1157470.0, ans=0.0 2024-08-11 15:02:52,818 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14300, loss[loss=0.09907, beats_loss=0.01067, ecapa_loss=0.000207, whisper_loss=0.08632, over 18667.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01127, ecapa_loss=0.000195, whisper_loss=0.09338, over 3888711.69 frames. ], batch size: 75, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:02:54,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1157470.0, ans=0.125 2024-08-11 15:03:02,018 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-11 15:03:06,596 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 15:03:10,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=12.0 2024-08-11 15:03:31,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1157670.0, ans=0.1 2024-08-11 15:04:07,922 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14350, loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0002225, whisper_loss=0.09115, over 22586.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0113, ecapa_loss=0.0001954, whisper_loss=0.09303, over 3913901.19 frames. ], batch size: 92, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:04:12,380 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 15:04:15,147 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 15:04:16,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.155e+01 2.896e+01 3.266e+01 3.801e+01 1.000e+02, threshold=6.532e+01, percent-clipped=2.0 2024-08-11 15:04:29,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1158070.0, ans=0.125 2024-08-11 15:04:45,128 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 15:04:59,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1158270.0, ans=0.05 2024-08-11 15:05:19,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1158370.0, ans=0.1 2024-08-11 15:05:19,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1158370.0, ans=0.125 2024-08-11 15:05:19,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1158370.0, ans=0.2 2024-08-11 15:05:23,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2024-08-11 15:05:23,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14400, loss[loss=0.1176, beats_loss=0.01161, ecapa_loss=0.0001789, whisper_loss=0.1042, over 23762.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01127, ecapa_loss=0.0001967, whisper_loss=0.09423, over 3948830.57 frames. ], batch size: 91, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:05:42,733 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 15:05:59,370 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 15:06:14,458 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 15:06:14,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1158770.0, ans=0.0 2024-08-11 15:06:21,761 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-11 15:06:23,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2024-08-11 15:06:24,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1158870.0, ans=0.0 2024-08-11 15:06:39,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 8, batch 14450, loss[loss=0.1121, beats_loss=0.008694, ecapa_loss=0.0002184, whisper_loss=0.1013, over 14831.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01131, ecapa_loss=0.0001981, whisper_loss=0.09387, over 3963369.42 frames. ], batch size: 57, lr: 7.65e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:06:45,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1158970.0, ans=0.125 2024-08-11 15:06:48,863 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+01 2.733e+01 3.088e+01 3.504e+01 7.570e+01, threshold=6.176e+01, percent-clipped=1.0 2024-08-11 15:07:27,668 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 15:07:42,806 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-8.pt 2024-08-11 15:08:19,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 0, loss[loss=0.1185, beats_loss=0.01142, ecapa_loss=0.0002027, whisper_loss=0.105, over 18503.00 frames. ], tot_loss[loss=0.1185, beats_loss=0.01142, ecapa_loss=0.0002027, whisper_loss=0.105, over 18503.00 frames. ], batch size: 72, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:08:19,712 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 15:08:56,606 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on ASR_libri: loss=0.2578, beats_loss=0, ecapa_loss=0.0006493, whisper_loss=0.2513, over 922467.00 frames. 2024-08-11 15:09:15,629 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on SV_voxceleb1: loss=0.005328, beats_loss=0, ecapa_loss=0.0005328, whisper_loss=0, over 939242.00 frames. 2024-08-11 15:11:18,925 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on AT_audioset: loss=0.0249, beats_loss=0.0249, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 15:11:18,930 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 15:11:19,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1159380.0, ans=0.2 2024-08-11 15:11:20,403 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 15:11:33,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1159380.0, ans=0.125 2024-08-11 15:12:48,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1159580.0, ans=0.1 2024-08-11 15:13:20,073 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 15:13:29,973 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 15:13:41,782 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 15:14:02,336 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-11 15:14:02,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1159780.0, ans=0.125 2024-08-11 15:14:11,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1159780.0, ans=0.125 2024-08-11 15:14:32,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 50, loss[loss=0.0926, beats_loss=0.01039, ecapa_loss=0.0001927, whisper_loss=0.08028, over 15625.00 frames. ], tot_loss[loss=0.1086, beats_loss=0.01051, ecapa_loss=0.0002015, whisper_loss=0.09605, over 898433.76 frames. ], batch size: 62, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:14:43,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-11 15:15:31,869 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-116000.pt 2024-08-11 15:15:49,181 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 15:15:52,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.906e+01 3.207e+01 3.715e+01 5.089e+01, threshold=6.415e+01, percent-clipped=0.0 2024-08-11 15:16:46,572 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 15:17:25,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1160180.0, ans=0.0 2024-08-11 15:19:03,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 100, loss[loss=0.1042, beats_loss=0.01243, ecapa_loss=0.0002182, whisper_loss=0.08955, over 22245.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01073, ecapa_loss=0.0001974, whisper_loss=0.09408, over 1579548.98 frames. ], batch size: 90, lr: 7.24e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:19:19,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1160380.0, ans=0.09899494936611666 2024-08-11 15:19:29,453 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-11 15:19:39,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-11 15:20:08,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1160480.0, ans=0.1 2024-08-11 15:21:35,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1160780.0, ans=0.125 2024-08-11 15:21:43,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1160780.0, ans=0.0 2024-08-11 15:21:49,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1160780.0, ans=10.0 2024-08-11 15:22:01,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2024-08-11 15:22:03,423 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 150, loss[loss=0.08393, beats_loss=0.01268, ecapa_loss=0.0001714, whisper_loss=0.06953, over 17283.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01077, ecapa_loss=0.0001957, whisper_loss=0.09237, over 2046976.88 frames. ], batch size: 70, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:22:06,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1160880.0, ans=0.1 2024-08-11 15:22:14,198 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 15:22:23,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1160880.0, ans=10.0 2024-08-11 15:22:43,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.986e+01 3.190e+01 3.682e+01 6.515e+01, threshold=6.380e+01, percent-clipped=1.0 2024-08-11 15:22:46,324 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.074e-01 2024-08-11 15:22:48,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-08-11 15:22:51,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.33 vs. limit=15.0 2024-08-11 15:22:54,103 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 15:23:59,012 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 15:24:01,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1161380.0, ans=0.125 2024-08-11 15:24:02,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 200, loss[loss=0.1217, beats_loss=0.009493, ecapa_loss=0.0002013, whisper_loss=0.1102, over 20481.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0108, ecapa_loss=0.0001962, whisper_loss=0.09294, over 2428556.66 frames. ], batch size: 80, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:24:10,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2024-08-11 15:24:40,879 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 15:24:41,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1161580.0, ans=0.125 2024-08-11 15:24:42,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=1161580.0, ans=0.02 2024-08-11 15:24:46,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1161580.0, ans=0.2 2024-08-11 15:25:12,734 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-11 15:25:16,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1161780.0, ans=0.5 2024-08-11 15:25:23,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1161780.0, ans=0.2 2024-08-11 15:25:25,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1161780.0, ans=0.0 2024-08-11 15:25:30,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1161780.0, ans=0.1 2024-08-11 15:25:35,830 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 250, loss[loss=0.1069, beats_loss=0.01101, ecapa_loss=0.0001625, whisper_loss=0.09429, over 18976.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01093, ecapa_loss=0.0001968, whisper_loss=0.09277, over 2720097.47 frames. ], batch size: 72, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:26:05,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.655e+01 2.964e+01 3.308e+01 4.229e+01, threshold=5.928e+01, percent-clipped=0.0 2024-08-11 15:26:32,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1162080.0, ans=0.1 2024-08-11 15:26:35,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1162080.0, ans=0.125 2024-08-11 15:26:54,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.72 vs. limit=10.0 2024-08-11 15:27:21,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 300, loss[loss=0.1208, beats_loss=0.01118, ecapa_loss=0.0001765, whisper_loss=0.1078, over 20513.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01086, ecapa_loss=0.0001972, whisper_loss=0.09319, over 2994578.67 frames. ], batch size: 82, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:27:22,316 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-11 15:27:25,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-08-11 15:27:33,310 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 15:28:04,535 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 15:28:09,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1162680.0, ans=0.09899494936611666 2024-08-11 15:28:26,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1162780.0, ans=0.125 2024-08-11 15:28:34,776 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 15:28:39,418 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 350, loss[loss=0.1191, beats_loss=0.01045, ecapa_loss=0.0001671, whisper_loss=0.1069, over 23593.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01089, ecapa_loss=0.0001975, whisper_loss=0.09272, over 3155518.50 frames. ], batch size: 92, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:28:53,981 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 28 from Vox, 46 fro AS 2024-08-11 15:29:00,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.571e+01 3.026e+01 3.460e+01 5.079e+01, threshold=6.051e+01, percent-clipped=0.0 2024-08-11 15:29:24,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1163180.0, ans=0.1 2024-08-11 15:29:26,731 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 15:29:27,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1163180.0, ans=0.125 2024-08-11 15:29:32,743 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 15:29:34,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-11 15:29:50,811 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 400, loss[loss=0.1035, beats_loss=0.01139, ecapa_loss=0.0001868, whisper_loss=0.09029, over 19463.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01097, ecapa_loss=0.0001967, whisper_loss=0.093, over 3321367.71 frames. ], batch size: 75, lr: 7.23e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:29:56,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1163380.0, ans=0.125 2024-08-11 15:30:03,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1163480.0, ans=0.125 2024-08-11 15:30:06,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1163480.0, ans=0.125 2024-08-11 15:30:13,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2024-08-11 15:30:45,844 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 15:30:47,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1163780.0, ans=0.0 2024-08-11 15:30:51,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1163780.0, ans=0.2 2024-08-11 15:31:01,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 450, loss[loss=0.0879, beats_loss=0.01264, ecapa_loss=0.0002237, whisper_loss=0.07302, over 16290.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01104, ecapa_loss=0.0001939, whisper_loss=0.09302, over 3455321.86 frames. ], batch size: 68, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:31:18,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=15.0 2024-08-11 15:31:22,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.636e+01 2.915e+01 3.353e+01 5.482e+01, threshold=5.829e+01, percent-clipped=0.0 2024-08-11 15:31:35,164 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 15:31:49,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1164180.0, ans=0.125 2024-08-11 15:32:04,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1164280.0, ans=0.05 2024-08-11 15:32:14,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 500, loss[loss=0.109, beats_loss=0.01161, ecapa_loss=0.0001608, whisper_loss=0.09579, over 14767.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0111, ecapa_loss=0.0001928, whisper_loss=0.09257, over 3536796.45 frames. ], batch size: 56, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:32:15,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1164380.0, ans=0.0 2024-08-11 15:32:27,584 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 15:32:32,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1164480.0, ans=0.0 2024-08-11 15:32:37,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1164480.0, ans=0.125 2024-08-11 15:33:08,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1164680.0, ans=0.0 2024-08-11 15:33:15,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2024-08-11 15:33:24,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1164780.0, ans=0.125 2024-08-11 15:33:26,530 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 550, loss[loss=0.09606, beats_loss=0.0119, ecapa_loss=0.0001942, whisper_loss=0.08222, over 22781.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01106, ecapa_loss=0.0001927, whisper_loss=0.0927, over 3608559.64 frames. ], batch size: 91, lr: 7.22e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:33:28,079 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 15:33:28,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1164880.0, ans=0.125 2024-08-11 15:33:48,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.637e+01 3.008e+01 3.365e+01 4.595e+01, threshold=6.017e+01, percent-clipped=0.0 2024-08-11 15:33:57,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1165080.0, ans=0.125 2024-08-11 15:33:58,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-08-11 15:34:07,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1165080.0, ans=0.125 2024-08-11 15:34:17,494 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 15:34:27,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1165280.0, ans=0.1 2024-08-11 15:34:38,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 600, loss[loss=0.1036, beats_loss=0.01015, ecapa_loss=0.0001749, whisper_loss=0.09173, over 19225.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01111, ecapa_loss=0.0001923, whisper_loss=0.09269, over 3672501.28 frames. ], batch size: 73, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:34:46,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1165380.0, ans=0.0 2024-08-11 15:34:58,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1165480.0, ans=0.1 2024-08-11 15:35:06,996 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 15:35:27,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1165680.0, ans=0.0 2024-08-11 15:35:32,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1165680.0, ans=0.1 2024-08-11 15:35:37,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1165780.0, ans=0.1 2024-08-11 15:35:44,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1165780.0, ans=0.125 2024-08-11 15:35:46,017 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 15:35:49,230 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-11 15:35:53,126 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 650, loss[loss=0.09098, beats_loss=0.01143, ecapa_loss=0.0001639, whisper_loss=0.07791, over 17496.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01115, ecapa_loss=0.0001918, whisper_loss=0.09213, over 3683146.29 frames. ], batch size: 67, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:35:59,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1165880.0, ans=0.05 2024-08-11 15:36:05,652 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 15:36:16,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.692e+01 3.015e+01 3.566e+01 6.762e+01, threshold=6.030e+01, percent-clipped=2.0 2024-08-11 15:36:16,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1165980.0, ans=0.125 2024-08-11 15:36:34,562 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 15:37:02,771 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-11 15:37:13,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 700, loss[loss=0.09473, beats_loss=0.01402, ecapa_loss=0.0001618, whisper_loss=0.07909, over 21271.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01121, ecapa_loss=0.000191, whisper_loss=0.09212, over 3725915.21 frames. ], batch size: 87, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:37:17,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1166380.0, ans=0.05 2024-08-11 15:37:31,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1166480.0, ans=0.125 2024-08-11 15:38:27,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1166780.0, ans=0.125 2024-08-11 15:38:35,762 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 750, loss[loss=0.1019, beats_loss=0.01122, ecapa_loss=0.0001747, whisper_loss=0.08893, over 22216.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01134, ecapa_loss=0.0001887, whisper_loss=0.09178, over 3760302.28 frames. ], batch size: 89, lr: 7.22e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:38:52,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1166980.0, ans=0.125 2024-08-11 15:38:55,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1166980.0, ans=0.2 2024-08-11 15:39:00,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.889e+01 3.485e+01 5.934e+01, threshold=5.777e+01, percent-clipped=0.0 2024-08-11 15:39:09,736 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-11 15:39:12,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1167080.0, ans=0.1 2024-08-11 15:39:15,458 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 15:39:19,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1167080.0, ans=0.0 2024-08-11 15:39:36,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1167180.0, ans=0.1 2024-08-11 15:39:48,842 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-11 15:40:00,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 800, loss[loss=0.1055, beats_loss=0.01039, ecapa_loss=0.0001833, whisper_loss=0.09323, over 23602.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01132, ecapa_loss=0.0001873, whisper_loss=0.09172, over 3741301.36 frames. ], batch size: 93, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:40:01,950 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 15:40:07,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-11 15:40:11,818 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 15:40:28,251 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 33 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-11 15:40:43,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1167580.0, ans=0.125 2024-08-11 15:40:51,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1167680.0, ans=0.0 2024-08-11 15:41:03,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2024-08-11 15:41:07,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1167780.0, ans=0.1 2024-08-11 15:41:09,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1167780.0, ans=0.2 2024-08-11 15:41:25,090 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 850, loss[loss=0.1047, beats_loss=0.0113, ecapa_loss=0.0001905, whisper_loss=0.0915, over 19492.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01125, ecapa_loss=0.0001876, whisper_loss=0.09199, over 3725950.17 frames. ], batch size: 79, lr: 7.21e-03, grad_scale: 1.152921504606847e+18 2024-08-11 15:41:42,097 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 15:41:52,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.648e+01 3.009e+01 3.325e+01 6.049e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-11 15:41:53,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1167980.0, ans=0.0 2024-08-11 15:41:56,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1167980.0, ans=0.125 2024-08-11 15:42:12,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1168080.0, ans=0.0 2024-08-11 15:42:31,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1168280.0, ans=0.125 2024-08-11 15:42:50,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 900, loss[loss=0.1105, beats_loss=0.01173, ecapa_loss=0.0001778, whisper_loss=0.09698, over 22336.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01121, ecapa_loss=0.0001865, whisper_loss=0.09257, over 3764658.56 frames. ], batch size: 87, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:43:25,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1168580.0, ans=0.125 2024-08-11 15:43:50,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-11 15:43:57,337 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 15:44:11,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.07 vs. limit=22.5 2024-08-11 15:44:15,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 950, loss[loss=0.09479, beats_loss=0.01225, ecapa_loss=0.0002149, whisper_loss=0.08039, over 14452.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01122, ecapa_loss=0.0001863, whisper_loss=0.09219, over 3719838.98 frames. ], batch size: 60, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:44:20,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-08-11 15:44:37,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-11 15:44:42,599 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.663e+01 2.966e+01 3.403e+01 1.009e+02, threshold=5.932e+01, percent-clipped=1.0 2024-08-11 15:44:54,635 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 15:44:59,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1169080.0, ans=0.125 2024-08-11 15:45:08,522 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 15:45:15,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1169180.0, ans=0.125 2024-08-11 15:45:37,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1000, loss[loss=0.09249, beats_loss=0.01336, ecapa_loss=0.0001857, whisper_loss=0.07726, over 22193.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01134, ecapa_loss=0.0001858, whisper_loss=0.09075, over 3717618.34 frames. ], batch size: 92, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:45:46,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1169380.0, ans=0.1 2024-08-11 15:45:48,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1169380.0, ans=0.0 2024-08-11 15:45:54,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1169480.0, ans=0.125 2024-08-11 15:46:03,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1169480.0, ans=0.125 2024-08-11 15:46:08,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1169580.0, ans=0.125 2024-08-11 15:46:20,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1169580.0, ans=0.0 2024-08-11 15:46:27,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1169680.0, ans=0.0 2024-08-11 15:46:42,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1169780.0, ans=0.125 2024-08-11 15:46:49,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1169780.0, ans=0.0 2024-08-11 15:47:01,207 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1050, loss[loss=0.09258, beats_loss=0.01301, ecapa_loss=0.0001442, whisper_loss=0.07813, over 17633.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01137, ecapa_loss=0.0001858, whisper_loss=0.09063, over 3741355.61 frames. ], batch size: 68, lr: 7.21e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:47:07,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1169880.0, ans=0.125 2024-08-11 15:47:14,117 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 15:47:14,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1169880.0, ans=0.125 2024-08-11 15:47:29,251 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.579e+01 2.847e+01 3.241e+01 6.261e+01, threshold=5.695e+01, percent-clipped=1.0 2024-08-11 15:47:32,917 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 15 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 15:47:45,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1170080.0, ans=0.125 2024-08-11 15:47:47,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.64 vs. limit=15.0 2024-08-11 15:47:59,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=1170180.0, ans=0.2 2024-08-11 15:48:10,304 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 15:48:14,633 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 15:48:19,173 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-11 15:48:27,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1170280.0, ans=0.125 2024-08-11 15:48:32,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1100, loss[loss=0.09299, beats_loss=0.01441, ecapa_loss=0.0001448, whisper_loss=0.07713, over 22509.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0114, ecapa_loss=0.0001852, whisper_loss=0.09061, over 3751369.45 frames. ], batch size: 90, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:48:43,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=12.0 2024-08-11 15:48:48,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1170480.0, ans=0.125 2024-08-11 15:48:50,538 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 15:49:05,039 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 15:49:38,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-11 15:49:58,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1150, loss[loss=0.0957, beats_loss=0.01394, ecapa_loss=0.0001675, whisper_loss=0.08009, over 22006.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01135, ecapa_loss=0.0001866, whisper_loss=0.09042, over 3735613.67 frames. ], batch size: 90, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:50:07,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1170880.0, ans=0.07 2024-08-11 15:50:10,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1170880.0, ans=0.1 2024-08-11 15:50:14,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1170980.0, ans=0.0 2024-08-11 15:50:25,763 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.574e+01 2.982e+01 3.415e+01 5.178e+01, threshold=5.965e+01, percent-clipped=0.0 2024-08-11 15:50:45,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1171080.0, ans=0.125 2024-08-11 15:51:01,410 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-11 15:51:09,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1171280.0, ans=0.125 2024-08-11 15:51:19,402 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 32 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 15:51:19,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1171380.0, ans=0.1 2024-08-11 15:51:20,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-08-11 15:51:20,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1200, loss[loss=0.1339, beats_loss=0.0107, ecapa_loss=0.0001452, whisper_loss=0.1217, over 20334.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01132, ecapa_loss=0.0001868, whisper_loss=0.09157, over 3748920.76 frames. ], batch size: 73, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:51:37,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1171480.0, ans=0.0 2024-08-11 15:51:45,042 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 17 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-11 15:51:59,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1171580.0, ans=0.125 2024-08-11 15:52:06,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2024-08-11 15:52:09,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1171680.0, ans=0.125 2024-08-11 15:52:30,711 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 15:52:42,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1250, loss[loss=0.1035, beats_loss=0.01346, ecapa_loss=0.0001879, whisper_loss=0.08814, over 21822.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01125, ecapa_loss=0.0001865, whisper_loss=0.09182, over 3755319.03 frames. ], batch size: 92, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:52:45,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1171880.0, ans=0.0 2024-08-11 15:52:45,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1171880.0, ans=0.5 2024-08-11 15:52:47,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1171880.0, ans=0.2 2024-08-11 15:53:06,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1171980.0, ans=0.0 2024-08-11 15:53:07,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.593e+01 3.089e+01 3.473e+01 5.447e+01, threshold=6.177e+01, percent-clipped=0.0 2024-08-11 15:53:12,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1172080.0, ans=0.125 2024-08-11 15:53:12,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1172080.0, ans=0.125 2024-08-11 15:53:28,085 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 15:53:29,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1172180.0, ans=0.1 2024-08-11 15:53:39,028 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-11 15:54:02,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1300, loss[loss=0.1018, beats_loss=0.009497, ecapa_loss=0.0002245, whisper_loss=0.09006, over 17494.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01121, ecapa_loss=0.0001881, whisper_loss=0.09147, over 3780572.56 frames. ], batch size: 70, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:54:04,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1172380.0, ans=0.125 2024-08-11 15:54:10,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1172380.0, ans=0.05 2024-08-11 15:54:18,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1172480.0, ans=0.0 2024-08-11 15:54:28,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1172480.0, ans=0.05 2024-08-11 15:54:33,228 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 15:54:38,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1172580.0, ans=0.125 2024-08-11 15:54:48,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1172680.0, ans=0.1 2024-08-11 15:55:00,340 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 15:55:02,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1172680.0, ans=10.0 2024-08-11 15:55:22,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1350, loss[loss=0.08003, beats_loss=0.01238, ecapa_loss=0.0002087, whisper_loss=0.06556, over 17022.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01125, ecapa_loss=0.0001867, whisper_loss=0.09116, over 3786603.03 frames. ], batch size: 69, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:55:51,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.580e+01 3.028e+01 3.578e+01 5.392e+01, threshold=6.056e+01, percent-clipped=0.0 2024-08-11 15:55:53,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1172980.0, ans=0.0 2024-08-11 15:56:00,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1173080.0, ans=0.125 2024-08-11 15:56:01,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-11 15:56:06,898 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 15:56:14,177 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 15:56:19,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1173180.0, ans=0.125 2024-08-11 15:56:39,996 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 15:56:50,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1400, loss[loss=0.1009, beats_loss=0.01269, ecapa_loss=0.0001882, whisper_loss=0.08629, over 19514.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01115, ecapa_loss=0.0001872, whisper_loss=0.09224, over 3785083.31 frames. ], batch size: 80, lr: 7.20e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:56:59,829 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 15:57:01,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1173380.0, ans=0.0 2024-08-11 15:57:22,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2024-08-11 15:57:26,335 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 15:57:29,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1173580.0, ans=0.1 2024-08-11 15:57:33,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1173580.0, ans=0.125 2024-08-11 15:57:39,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1173680.0, ans=0.0 2024-08-11 15:57:42,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1173680.0, ans=0.125 2024-08-11 15:57:53,955 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 15:57:57,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1173780.0, ans=0.125 2024-08-11 15:58:02,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.74 vs. limit=22.5 2024-08-11 15:58:08,237 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-11 15:58:13,202 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1450, loss[loss=0.1057, beats_loss=0.01282, ecapa_loss=0.0001948, whisper_loss=0.09096, over 21900.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01118, ecapa_loss=0.0001866, whisper_loss=0.09169, over 3784885.34 frames. ], batch size: 90, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 15:58:58,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1173980.0, ans=0.1 2024-08-11 15:59:02,136 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 14 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 15:59:07,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.75 vs. limit=22.5 2024-08-11 15:59:09,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.580e+01 2.876e+01 3.331e+01 4.704e+01, threshold=5.752e+01, percent-clipped=0.0 2024-08-11 15:59:13,192 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-11 15:59:24,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-11 15:59:26,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1174080.0, ans=0.1 2024-08-11 15:59:39,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1174180.0, ans=0.125 2024-08-11 15:59:40,635 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 15:59:45,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1174180.0, ans=0.1 2024-08-11 16:00:03,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-11 16:00:06,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1500, loss[loss=0.08369, beats_loss=0.01419, ecapa_loss=0.0001542, whisper_loss=0.06795, over 22005.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01123, ecapa_loss=0.0001853, whisper_loss=0.09151, over 3833830.23 frames. ], batch size: 91, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:00:08,093 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 16:00:16,817 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-11 16:00:22,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1174480.0, ans=0.125 2024-08-11 16:01:12,706 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-11 16:01:26,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1550, loss[loss=0.1151, beats_loss=0.009588, ecapa_loss=0.0002032, whisper_loss=0.1035, over 22380.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01116, ecapa_loss=0.0001852, whisper_loss=0.09192, over 3834656.10 frames. ], batch size: 90, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:01:29,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=12.0 2024-08-11 16:01:32,986 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-11 16:01:43,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-08-11 16:01:45,118 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 16:01:52,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.587e+01 2.923e+01 3.490e+01 5.175e+01, threshold=5.845e+01, percent-clipped=0.0 2024-08-11 16:01:59,080 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 16:02:00,818 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:02:27,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1175280.0, ans=0.1 2024-08-11 16:02:38,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1175280.0, ans=0.0 2024-08-11 16:02:43,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1600, loss[loss=0.097, beats_loss=0.008262, ecapa_loss=0.000221, whisper_loss=0.08653, over 16077.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.0001843, whisper_loss=0.09199, over 3822962.11 frames. ], batch size: 62, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:02:48,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1175380.0, ans=0.2 2024-08-11 16:02:48,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1175380.0, ans=0.0 2024-08-11 16:02:49,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1175380.0, ans=0.0 2024-08-11 16:02:56,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1175380.0, ans=0.125 2024-08-11 16:02:59,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1175480.0, ans=0.2 2024-08-11 16:03:15,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1175580.0, ans=0.1 2024-08-11 16:03:23,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1175580.0, ans=0.0 2024-08-11 16:03:36,714 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 16:03:36,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1175680.0, ans=0.125 2024-08-11 16:03:41,662 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 33 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-11 16:03:43,028 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 16:03:48,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=12.0 2024-08-11 16:04:00,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1650, loss[loss=0.09897, beats_loss=0.01354, ecapa_loss=0.000167, whisper_loss=0.08375, over 22562.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01111, ecapa_loss=0.0001854, whisper_loss=0.09207, over 3829677.90 frames. ], batch size: 91, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:04:05,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1175880.0, ans=0.05 2024-08-11 16:04:13,426 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 12 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 16:04:19,230 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 16:04:25,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.491e+01 2.765e+01 3.253e+01 5.216e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-11 16:04:40,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2024-08-11 16:04:46,715 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.358e-02 2024-08-11 16:04:53,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2024-08-11 16:04:59,687 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 16:05:07,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-11 16:05:17,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1700, loss[loss=0.1085, beats_loss=0.01115, ecapa_loss=0.0002081, whisper_loss=0.09526, over 16273.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.0001862, whisper_loss=0.09207, over 3815567.61 frames. ], batch size: 66, lr: 7.19e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:05:35,265 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:05:45,936 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 16:05:53,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-08-11 16:06:06,988 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 16:06:20,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1176780.0, ans=0.0 2024-08-11 16:06:30,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1750, loss[loss=0.1023, beats_loss=0.0126, ecapa_loss=0.0001613, whisper_loss=0.08804, over 20050.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01117, ecapa_loss=0.0001849, whisper_loss=0.09163, over 3812834.16 frames. ], batch size: 78, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:06:32,352 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-11 16:06:34,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.78 vs. limit=10.0 2024-08-11 16:06:50,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1176980.0, ans=0.125 2024-08-11 16:06:53,815 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.634e+01 3.052e+01 3.436e+01 4.631e+01, threshold=6.105e+01, percent-clipped=0.0 2024-08-11 16:06:55,495 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 16:07:36,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1177280.0, ans=0.0 2024-08-11 16:07:42,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1800, loss[loss=0.09752, beats_loss=0.01156, ecapa_loss=0.0001951, whisper_loss=0.08401, over 19400.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01113, ecapa_loss=0.0001858, whisper_loss=0.09158, over 3800993.02 frames. ], batch size: 77, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:07:59,362 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 16:08:01,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1177480.0, ans=0.125 2024-08-11 16:08:01,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1177480.0, ans=15.0 2024-08-11 16:08:05,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=15.0 2024-08-11 16:08:12,378 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 16:08:15,250 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 16:08:25,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1177680.0, ans=0.0 2024-08-11 16:08:30,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1177680.0, ans=0.1 2024-08-11 16:08:34,513 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 11 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 16:08:36,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1177680.0, ans=0.1 2024-08-11 16:08:44,344 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 16:08:47,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-11 16:08:54,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1850, loss[loss=0.0978, beats_loss=0.01399, ecapa_loss=0.0001652, whisper_loss=0.08215, over 17701.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01115, ecapa_loss=0.0001868, whisper_loss=0.09135, over 3796094.76 frames. ], batch size: 69, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:08:59,223 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 16:09:12,828 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 16:09:18,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.020e+01 2.637e+01 3.046e+01 3.560e+01 5.616e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 16:09:24,273 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 22 from LS+wenet, 20 from Vox, 11 fro AS 2024-08-11 16:09:47,787 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 16:09:48,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1178180.0, ans=0.125 2024-08-11 16:10:07,671 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1900, loss[loss=0.08185, beats_loss=0.01117, ecapa_loss=0.0002603, whisper_loss=0.06808, over 15766.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.0001879, whisper_loss=0.09161, over 3785345.30 frames. ], batch size: 69, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:10:18,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1178380.0, ans=0.5 2024-08-11 16:10:32,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-08-11 16:10:56,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1178680.0, ans=0.125 2024-08-11 16:11:22,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 1950, loss[loss=0.1146, beats_loss=0.00986, ecapa_loss=0.0001889, whisper_loss=0.1028, over 22978.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01113, ecapa_loss=0.0001878, whisper_loss=0.0915, over 3802615.72 frames. ], batch size: 90, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:11:25,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-11 16:11:29,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1178880.0, ans=0.125 2024-08-11 16:11:40,858 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 16:11:45,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.606e+01 2.950e+01 3.514e+01 8.174e+01, threshold=5.900e+01, percent-clipped=2.0 2024-08-11 16:11:53,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1179080.0, ans=0.1 2024-08-11 16:12:01,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2024-08-11 16:12:03,607 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 16:12:17,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1179180.0, ans=0.125 2024-08-11 16:12:36,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2000, loss[loss=0.1025, beats_loss=0.01082, ecapa_loss=0.0001919, whisper_loss=0.08978, over 20107.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01121, ecapa_loss=0.0001872, whisper_loss=0.09183, over 3833363.09 frames. ], batch size: 80, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:12:47,254 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 17 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 16:12:53,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1179480.0, ans=0.125 2024-08-11 16:13:04,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1179480.0, ans=0.125 2024-08-11 16:13:08,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1179580.0, ans=0.0 2024-08-11 16:13:12,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1179580.0, ans=0.1 2024-08-11 16:13:17,856 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 16:13:22,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1179680.0, ans=0.1 2024-08-11 16:13:37,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-08-11 16:13:46,348 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 16:13:53,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2050, loss[loss=0.107, beats_loss=0.01055, ecapa_loss=0.0001804, whisper_loss=0.09468, over 16720.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01124, ecapa_loss=0.000188, whisper_loss=0.09125, over 3828127.42 frames. ], batch size: 63, lr: 7.18e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:13:57,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1179880.0, ans=0.1 2024-08-11 16:14:08,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1179980.0, ans=0.1 2024-08-11 16:14:18,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.671e+01 2.965e+01 3.227e+01 2.393e+02, threshold=5.931e+01, percent-clipped=1.0 2024-08-11 16:14:25,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1180080.0, ans=0.0 2024-08-11 16:14:28,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1180080.0, ans=0.1 2024-08-11 16:14:36,852 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 16:14:59,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-11 16:15:03,549 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:15:13,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1180380.0, ans=0.025 2024-08-11 16:15:14,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2100, loss[loss=0.123, beats_loss=0.007907, ecapa_loss=0.0002419, whisper_loss=0.1127, over 14060.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01129, ecapa_loss=0.0001874, whisper_loss=0.09146, over 3838663.75 frames. ], batch size: 53, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:15:20,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1180380.0, ans=0.125 2024-08-11 16:15:39,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1180480.0, ans=0.125 2024-08-11 16:15:39,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1180480.0, ans=0.125 2024-08-11 16:15:46,186 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 16:15:46,472 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:15:46,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1180480.0, ans=0.0 2024-08-11 16:15:50,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1180580.0, ans=0.125 2024-08-11 16:15:59,307 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 16:16:07,304 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 16:16:36,233 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-11 16:16:37,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2150, loss[loss=0.1243, beats_loss=0.009398, ecapa_loss=0.0002256, whisper_loss=0.1126, over 17265.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01131, ecapa_loss=0.0001867, whisper_loss=0.09144, over 3819497.54 frames. ], batch size: 71, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:16:46,717 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 16:16:47,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-11 16:17:00,416 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 16:17:03,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.740e+01 2.984e+01 3.481e+01 5.761e+01, threshold=5.968e+01, percent-clipped=0.0 2024-08-11 16:17:21,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1181080.0, ans=0.125 2024-08-11 16:17:37,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1181180.0, ans=15.0 2024-08-11 16:17:40,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1181180.0, ans=0.0 2024-08-11 16:17:41,476 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 16:18:01,082 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-11 16:18:02,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2200, loss[loss=0.07527, beats_loss=0.01161, ecapa_loss=0.0002019, whisper_loss=0.06164, over 17757.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01126, ecapa_loss=0.0001879, whisper_loss=0.09147, over 3806302.46 frames. ], batch size: 75, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:18:05,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1181380.0, ans=0.125 2024-08-11 16:18:13,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1181380.0, ans=0.0 2024-08-11 16:18:18,076 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-11 16:18:21,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1181480.0, ans=0.0 2024-08-11 16:18:21,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1181480.0, ans=0.125 2024-08-11 16:18:36,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1181580.0, ans=0.035 2024-08-11 16:18:36,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1181580.0, ans=0.07 2024-08-11 16:18:48,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1181580.0, ans=0.0 2024-08-11 16:18:53,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1181680.0, ans=0.125 2024-08-11 16:19:06,771 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 16:19:17,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1181780.0, ans=0.125 2024-08-11 16:19:20,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1181780.0, ans=0.1 2024-08-11 16:19:21,592 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-11 16:19:24,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2250, loss[loss=0.1031, beats_loss=0.01223, ecapa_loss=0.0002443, whisper_loss=0.08848, over 16142.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01129, ecapa_loss=0.0001885, whisper_loss=0.09246, over 3848857.66 frames. ], batch size: 69, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:19:34,519 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 16:19:44,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1181980.0, ans=0.0 2024-08-11 16:19:44,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1181980.0, ans=0.125 2024-08-11 16:19:49,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1181980.0, ans=0.125 2024-08-11 16:19:50,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.696e+01 3.022e+01 3.450e+01 8.988e+01, threshold=6.044e+01, percent-clipped=1.0 2024-08-11 16:20:01,194 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 18 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-11 16:20:17,235 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 16:20:29,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1182280.0, ans=0.125 2024-08-11 16:20:45,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2300, loss[loss=0.09907, beats_loss=0.009356, ecapa_loss=0.0002045, whisper_loss=0.08767, over 21869.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01124, ecapa_loss=0.0001899, whisper_loss=0.09306, over 3869069.82 frames. ], batch size: 89, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:20:58,002 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-11 16:20:59,779 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 16:21:17,642 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 16:21:17,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1182580.0, ans=0.125 2024-08-11 16:21:33,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-11 16:22:05,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2350, loss[loss=0.102, beats_loss=0.01282, ecapa_loss=0.0001996, whisper_loss=0.08721, over 18767.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01118, ecapa_loss=0.0001913, whisper_loss=0.09268, over 3878258.13 frames. ], batch size: 76, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:22:13,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1182880.0, ans=0.125 2024-08-11 16:22:23,454 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 16:22:30,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1182980.0, ans=10.0 2024-08-11 16:22:33,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1182980.0, ans=0.0 2024-08-11 16:22:34,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.605e+01 2.959e+01 3.391e+01 6.517e+01, threshold=5.918e+01, percent-clipped=1.0 2024-08-11 16:22:40,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1183080.0, ans=0.125 2024-08-11 16:22:44,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1183080.0, ans=0.0 2024-08-11 16:22:53,393 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 16:22:55,167 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 16:23:07,307 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 16:23:30,527 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2400, loss[loss=0.1235, beats_loss=0.01095, ecapa_loss=0.0001642, whisper_loss=0.1109, over 19240.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01109, ecapa_loss=0.0001908, whisper_loss=0.09294, over 3862182.43 frames. ], batch size: 71, lr: 7.17e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:23:37,670 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 16:23:39,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1183380.0, ans=0.1 2024-08-11 16:23:46,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1183480.0, ans=0.1 2024-08-11 16:24:00,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1183480.0, ans=0.0 2024-08-11 16:24:00,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-11 16:24:11,167 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-11 16:24:19,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1183680.0, ans=0.2 2024-08-11 16:24:24,480 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 16:24:26,147 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 16:24:34,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1183680.0, ans=0.0 2024-08-11 16:24:41,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-11 16:24:46,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1183780.0, ans=0.125 2024-08-11 16:24:47,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=12.0 2024-08-11 16:24:55,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2450, loss[loss=0.1092, beats_loss=0.01031, ecapa_loss=0.000214, whisper_loss=0.09673, over 22324.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0112, ecapa_loss=0.0001909, whisper_loss=0.09249, over 3862782.43 frames. ], batch size: 90, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:24:56,837 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 16:24:58,682 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 16:25:03,673 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 16:25:10,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1183980.0, ans=0.05 2024-08-11 16:25:17,870 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 16:25:20,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.638e+01 2.982e+01 3.407e+01 5.711e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 16:25:25,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2024-08-11 16:25:37,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1184080.0, ans=0.0 2024-08-11 16:26:11,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-08-11 16:26:16,759 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.847e+00 2024-08-11 16:26:18,245 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2500, loss[loss=0.09441, beats_loss=0.01278, ecapa_loss=0.0001879, whisper_loss=0.07976, over 22452.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01122, ecapa_loss=0.0001921, whisper_loss=0.09226, over 3837610.36 frames. ], batch size: 94, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:26:21,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1184380.0, ans=0.125 2024-08-11 16:26:28,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1184380.0, ans=0.1 2024-08-11 16:26:47,917 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-11 16:26:52,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1184580.0, ans=0.125 2024-08-11 16:26:54,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1184580.0, ans=0.0 2024-08-11 16:27:07,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-08-11 16:27:13,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1184680.0, ans=0.05 2024-08-11 16:27:21,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1184680.0, ans=0.1 2024-08-11 16:27:24,372 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-11 16:27:26,295 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:27:26,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1184780.0, ans=0.125 2024-08-11 16:27:45,235 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2550, loss[loss=0.08209, beats_loss=0.01392, ecapa_loss=0.0002026, whisper_loss=0.06615, over 18023.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01128, ecapa_loss=0.0001914, whisper_loss=0.09184, over 3861056.03 frames. ], batch size: 76, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:28:00,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2024-08-11 16:28:11,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.549e+01 2.871e+01 3.222e+01 4.395e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 16:28:12,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2024-08-11 16:28:55,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1185280.0, ans=0.125 2024-08-11 16:29:01,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2024-08-11 16:29:10,412 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2600, loss[loss=0.08558, beats_loss=0.0124, ecapa_loss=0.0002433, whisper_loss=0.07075, over 17592.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01134, ecapa_loss=0.0001913, whisper_loss=0.09131, over 3849990.53 frames. ], batch size: 75, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:29:16,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-11 16:29:18,697 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-11 16:29:22,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-11 16:29:24,718 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 16:29:29,730 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 16:29:55,619 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-11 16:29:57,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1185580.0, ans=0.125 2024-08-11 16:30:08,290 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-11 16:30:12,529 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 16:30:19,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1185780.0, ans=0.0 2024-08-11 16:30:34,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2650, loss[loss=0.1132, beats_loss=0.01217, ecapa_loss=0.0001588, whisper_loss=0.09947, over 19111.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0113, ecapa_loss=0.0001922, whisper_loss=0.09157, over 3845038.89 frames. ], batch size: 75, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:30:45,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1185880.0, ans=0.125 2024-08-11 16:31:01,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.668e+01 2.978e+01 3.517e+01 4.989e+01, threshold=5.956e+01, percent-clipped=0.0 2024-08-11 16:31:05,275 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 16:31:08,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1186080.0, ans=0.125 2024-08-11 16:31:46,409 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 16:31:49,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1186280.0, ans=0.0 2024-08-11 16:31:50,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=1186280.0, ans=0.1 2024-08-11 16:31:59,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2700, loss[loss=0.09657, beats_loss=0.009141, ecapa_loss=0.0002349, whisper_loss=0.08508, over 20537.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01134, ecapa_loss=0.000192, whisper_loss=0.0918, over 3861786.83 frames. ], batch size: 84, lr: 7.16e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:31:59,200 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 16:32:06,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1186380.0, ans=0.125 2024-08-11 16:32:08,475 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 33 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 16:32:30,021 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 16:32:32,037 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 16:32:47,395 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 16:32:53,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1186680.0, ans=0.0 2024-08-11 16:33:00,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1186680.0, ans=0.125 2024-08-11 16:33:03,324 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 16:33:08,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1186780.0, ans=0.125 2024-08-11 16:33:20,271 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2750, loss[loss=0.1027, beats_loss=0.01, ecapa_loss=0.0002091, whisper_loss=0.09065, over 14933.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01135, ecapa_loss=0.0001908, whisper_loss=0.09186, over 3860935.44 frames. ], batch size: 59, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:33:22,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1186880.0, ans=0.125 2024-08-11 16:33:22,375 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.337e-01 2024-08-11 16:33:47,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.790e+01 3.167e+01 3.660e+01 5.593e+01, threshold=6.335e+01, percent-clipped=0.0 2024-08-11 16:33:47,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1186980.0, ans=0.1 2024-08-11 16:33:48,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1186980.0, ans=0.09899494936611666 2024-08-11 16:33:50,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1186980.0, ans=0.0 2024-08-11 16:33:59,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1187080.0, ans=0.0 2024-08-11 16:34:28,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1187280.0, ans=0.2 2024-08-11 16:34:31,656 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 16:34:39,827 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 29 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 16:34:42,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2800, loss[loss=0.09277, beats_loss=0.01215, ecapa_loss=0.0001962, whisper_loss=0.07866, over 17612.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01122, ecapa_loss=0.0001909, whisper_loss=0.09292, over 3843748.60 frames. ], batch size: 71, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:34:56,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1187380.0, ans=0.125 2024-08-11 16:34:58,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1187480.0, ans=0.0 2024-08-11 16:34:59,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.91 vs. limit=22.5 2024-08-11 16:35:00,753 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 16:35:31,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1187680.0, ans=0.125 2024-08-11 16:35:58,486 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 16:36:04,564 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2850, loss[loss=0.1051, beats_loss=0.01399, ecapa_loss=0.0001878, whisper_loss=0.08924, over 21589.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01121, ecapa_loss=0.0001909, whisper_loss=0.0937, over 3853036.37 frames. ], batch size: 91, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:36:20,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1187980.0, ans=0.0 2024-08-11 16:36:21,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1187980.0, ans=0.1 2024-08-11 16:36:21,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1187980.0, ans=0.125 2024-08-11 16:36:31,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.618e+01 2.962e+01 3.443e+01 5.615e+01, threshold=5.924e+01, percent-clipped=0.0 2024-08-11 16:36:50,512 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 16:37:10,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1188280.0, ans=0.125 2024-08-11 16:37:28,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2900, loss[loss=0.09677, beats_loss=0.008757, ecapa_loss=0.0002014, whisper_loss=0.086, over 15566.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01124, ecapa_loss=0.0001926, whisper_loss=0.09334, over 3870388.64 frames. ], batch size: 60, lr: 7.15e-03, grad_scale: 1.152921504606847e+18 2024-08-11 16:37:39,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1188380.0, ans=0.0 2024-08-11 16:37:41,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1188380.0, ans=0.07 2024-08-11 16:37:50,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-08-11 16:37:54,876 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 16:38:25,031 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 16:38:27,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1188680.0, ans=0.125 2024-08-11 16:38:33,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1188780.0, ans=0.1 2024-08-11 16:38:43,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 2950, loss[loss=0.08862, beats_loss=0.01291, ecapa_loss=0.000155, whisper_loss=0.07416, over 19432.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0112, ecapa_loss=0.0001949, whisper_loss=0.09329, over 3902095.65 frames. ], batch size: 75, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:38:44,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1188880.0, ans=0.0 2024-08-11 16:38:53,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1188880.0, ans=0.125 2024-08-11 16:38:58,940 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 16:39:06,568 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.714e+01 3.075e+01 3.561e+01 5.736e+01, threshold=6.149e+01, percent-clipped=0.0 2024-08-11 16:39:09,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1189080.0, ans=0.1 2024-08-11 16:39:16,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1189080.0, ans=0.125 2024-08-11 16:39:19,132 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 16:39:21,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-11 16:39:32,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1189180.0, ans=0.125 2024-08-11 16:39:40,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1189280.0, ans=0.05 2024-08-11 16:39:42,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1189280.0, ans=0.125 2024-08-11 16:39:50,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1189380.0, ans=0.07 2024-08-11 16:39:51,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3000, loss[loss=0.1029, beats_loss=0.01019, ecapa_loss=0.0001887, whisper_loss=0.0908, over 20876.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01118, ecapa_loss=0.0001949, whisper_loss=0.09354, over 3931589.79 frames. ], batch size: 83, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:39:51,288 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 16:40:32,721 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on ASR_libri: loss=0.2566, beats_loss=0, ecapa_loss=0.0006312, whisper_loss=0.2502, over 922467.00 frames. 2024-08-11 16:40:50,137 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on SV_voxceleb1: loss=0.005299, beats_loss=0, ecapa_loss=0.0005299, whisper_loss=0, over 939242.00 frames. 2024-08-11 16:42:48,101 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on AT_audioset: loss=0.02498, beats_loss=0.02498, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 16:42:48,105 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 16:43:07,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1189480.0, ans=0.125 2024-08-11 16:43:12,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2024-08-11 16:43:24,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1189580.0, ans=0.125 2024-08-11 16:43:28,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=12.0 2024-08-11 16:43:31,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1189680.0, ans=0.0 2024-08-11 16:43:52,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1189780.0, ans=0.125 2024-08-11 16:43:52,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1189780.0, ans=0.2 2024-08-11 16:43:53,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2024-08-11 16:43:54,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3050, loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0002081, whisper_loss=0.09128, over 21877.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01117, ecapa_loss=0.000195, whisper_loss=0.09392, over 3935501.62 frames. ], batch size: 87, lr: 7.15e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:43:57,319 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 16:44:16,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+01 2.646e+01 3.011e+01 3.406e+01 6.810e+01, threshold=6.022e+01, percent-clipped=0.0 2024-08-11 16:44:18,694 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.956e-01 2024-08-11 16:44:19,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1190080.0, ans=0.1 2024-08-11 16:44:27,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1190080.0, ans=0.125 2024-08-11 16:44:32,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1190080.0, ans=0.05 2024-08-11 16:44:38,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=22.5 2024-08-11 16:44:43,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1190180.0, ans=0.0 2024-08-11 16:44:50,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-11 16:45:01,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3100, loss[loss=0.1006, beats_loss=0.01096, ecapa_loss=0.0002181, whisper_loss=0.08744, over 18397.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01118, ecapa_loss=0.0001955, whisper_loss=0.09458, over 3955791.18 frames. ], batch size: 72, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:45:05,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1190380.0, ans=0.0 2024-08-11 16:45:20,468 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 16:45:31,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1190580.0, ans=0.125 2024-08-11 16:45:35,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1190580.0, ans=0.125 2024-08-11 16:45:41,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1190680.0, ans=0.125 2024-08-11 16:45:44,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1190680.0, ans=0.0 2024-08-11 16:45:45,785 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 16:45:47,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2024-08-11 16:46:06,369 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-11 16:46:09,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3150, loss[loss=0.09859, beats_loss=0.01155, ecapa_loss=0.0001968, whisper_loss=0.08508, over 17136.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01122, ecapa_loss=0.0001953, whisper_loss=0.09399, over 3905863.17 frames. ], batch size: 70, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:46:10,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1190880.0, ans=0.0 2024-08-11 16:46:24,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1190980.0, ans=0.0 2024-08-11 16:46:25,145 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-11 16:46:26,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1190980.0, ans=0.2 2024-08-11 16:46:31,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.615e+01 2.879e+01 3.586e+01 1.580e+02, threshold=5.758e+01, percent-clipped=2.0 2024-08-11 16:47:09,247 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-11 16:47:15,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3200, loss[loss=0.09811, beats_loss=0.01164, ecapa_loss=0.0002124, whisper_loss=0.08435, over 13392.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.0112, ecapa_loss=0.000194, whisper_loss=0.09335, over 3867912.24 frames. ], batch size: 55, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:47:20,328 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.605e-02 2024-08-11 16:47:37,493 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 16:47:37,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1191480.0, ans=0.07 2024-08-11 16:47:40,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1191480.0, ans=0.0 2024-08-11 16:47:49,976 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 16:47:52,716 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 16:48:05,792 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 16:48:07,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-08-11 16:48:22,492 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3250, loss[loss=0.09016, beats_loss=0.01246, ecapa_loss=0.0002371, whisper_loss=0.07534, over 21123.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01122, ecapa_loss=0.0001935, whisper_loss=0.09376, over 3871702.97 frames. ], batch size: 91, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:48:25,301 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 16:48:35,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1191980.0, ans=0.2 2024-08-11 16:48:39,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1191980.0, ans=0.125 2024-08-11 16:48:45,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.517e+01 2.867e+01 3.292e+01 6.213e+01, threshold=5.733e+01, percent-clipped=1.0 2024-08-11 16:48:48,199 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 16:48:53,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=12.0 2024-08-11 16:49:29,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3300, loss[loss=0.1062, beats_loss=0.009426, ecapa_loss=0.0001916, whisper_loss=0.09485, over 16061.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01128, ecapa_loss=0.0001935, whisper_loss=0.09312, over 3884159.81 frames. ], batch size: 64, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:49:30,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1192380.0, ans=0.0 2024-08-11 16:49:32,820 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-11 16:49:43,328 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 16:49:49,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.13 vs. limit=22.5 2024-08-11 16:50:09,083 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 16:50:16,010 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 16:50:18,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=15.0 2024-08-11 16:50:22,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-11 16:50:31,704 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-11 16:50:33,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1192780.0, ans=0.125 2024-08-11 16:50:37,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3350, loss[loss=0.0912, beats_loss=0.01148, ecapa_loss=0.0001841, whisper_loss=0.07787, over 20616.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01122, ecapa_loss=0.0001934, whisper_loss=0.09368, over 3900041.17 frames. ], batch size: 84, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:50:38,507 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 16:50:48,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1192980.0, ans=0.1 2024-08-11 16:50:49,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1192980.0, ans=0.125 2024-08-11 16:50:49,984 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-11 16:50:54,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1192980.0, ans=0.125 2024-08-11 16:50:59,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.599e+01 2.933e+01 3.463e+01 7.726e+01, threshold=5.866e+01, percent-clipped=2.0 2024-08-11 16:51:16,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1193180.0, ans=0.125 2024-08-11 16:51:19,505 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 16:51:23,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.28 vs. limit=15.0 2024-08-11 16:51:28,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1193280.0, ans=0.125 2024-08-11 16:51:29,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1193280.0, ans=0.0 2024-08-11 16:51:42,644 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3400, loss[loss=0.08335, beats_loss=0.01477, ecapa_loss=0.0001958, whisper_loss=0.06662, over 20031.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01123, ecapa_loss=0.0001953, whisper_loss=0.09257, over 3917352.95 frames. ], batch size: 87, lr: 7.14e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:51:44,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1193380.0, ans=0.125 2024-08-11 16:51:52,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1193380.0, ans=0.125 2024-08-11 16:52:00,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1193480.0, ans=0.0 2024-08-11 16:52:20,321 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-11 16:52:24,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1193680.0, ans=0.125 2024-08-11 16:52:41,667 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 16:52:48,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3450, loss[loss=0.125, beats_loss=0.01018, ecapa_loss=0.0002691, whisper_loss=0.1122, over 21521.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001954, whisper_loss=0.09259, over 3914417.96 frames. ], batch size: 90, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:52:54,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1193880.0, ans=0.125 2024-08-11 16:52:54,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1193880.0, ans=0.0 2024-08-11 16:52:57,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1193880.0, ans=0.125 2024-08-11 16:52:59,552 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 16:53:06,231 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 16:53:06,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1193980.0, ans=0.0 2024-08-11 16:53:08,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1193980.0, ans=0.0 2024-08-11 16:53:11,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.578e+01 2.987e+01 3.563e+01 4.797e+01, threshold=5.975e+01, percent-clipped=0.0 2024-08-11 16:53:34,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1194180.0, ans=0.125 2024-08-11 16:53:50,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1194280.0, ans=0.125 2024-08-11 16:53:50,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1194280.0, ans=0.0 2024-08-11 16:53:54,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3500, loss[loss=0.09995, beats_loss=0.01082, ecapa_loss=0.0002041, whisper_loss=0.08709, over 22668.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01118, ecapa_loss=0.0001967, whisper_loss=0.09325, over 3935362.04 frames. ], batch size: 92, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:53:58,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1194380.0, ans=0.125 2024-08-11 16:54:05,266 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 16:54:11,433 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-11 16:54:17,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.79 vs. limit=15.0 2024-08-11 16:54:32,218 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 16:54:33,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1194680.0, ans=0.0 2024-08-11 16:54:40,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1194680.0, ans=0.2 2024-08-11 16:54:40,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1194680.0, ans=0.2 2024-08-11 16:54:41,577 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-11 16:54:47,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1194780.0, ans=0.125 2024-08-11 16:54:56,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1194780.0, ans=0.0 2024-08-11 16:55:00,071 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3550, loss[loss=0.09102, beats_loss=0.01029, ecapa_loss=0.0001605, whisper_loss=0.07913, over 15953.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01115, ecapa_loss=0.000195, whisper_loss=0.09319, over 3924833.80 frames. ], batch size: 62, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:55:00,281 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 16:55:20,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1194980.0, ans=0.0 2024-08-11 16:55:22,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.768e+01 2.986e+01 3.532e+01 5.359e+01, threshold=5.971e+01, percent-clipped=0.0 2024-08-11 16:55:38,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1195080.0, ans=0.05 2024-08-11 16:55:43,357 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-11 16:55:45,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2024-08-11 16:56:02,093 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-11 16:56:03,321 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 16:56:07,236 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3600, loss[loss=0.1065, beats_loss=0.009741, ecapa_loss=0.0002274, whisper_loss=0.09444, over 21062.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01113, ecapa_loss=0.0001948, whisper_loss=0.09376, over 3923135.03 frames. ], batch size: 88, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:56:11,343 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 16:56:18,323 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 16:56:20,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2024-08-11 16:56:27,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1195480.0, ans=0.125 2024-08-11 16:56:29,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1195480.0, ans=0.1 2024-08-11 16:56:29,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1195480.0, ans=0.125 2024-08-11 16:56:30,356 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 16:56:35,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2024-08-11 16:56:38,548 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 16:56:45,630 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-11 16:56:53,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1195680.0, ans=0.07 2024-08-11 16:56:56,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1195680.0, ans=0.125 2024-08-11 16:57:13,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1195880.0, ans=0.2 2024-08-11 16:57:13,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3650, loss[loss=0.1172, beats_loss=0.01017, ecapa_loss=0.000224, whisper_loss=0.1048, over 22983.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01114, ecapa_loss=0.0001951, whisper_loss=0.09368, over 3934740.20 frames. ], batch size: 93, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:57:21,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1195880.0, ans=0.0 2024-08-11 16:57:25,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1195980.0, ans=0.0 2024-08-11 16:57:29,684 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 16:57:36,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.649e+01 3.037e+01 3.697e+01 5.413e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 16:57:46,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1196080.0, ans=0.125 2024-08-11 16:57:46,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1196080.0, ans=0.125 2024-08-11 16:57:50,208 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 16:57:52,557 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 16:57:52,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1196180.0, ans=0.125 2024-08-11 16:58:16,103 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-11 16:58:21,143 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3700, loss[loss=0.09814, beats_loss=0.01025, ecapa_loss=0.0002577, whisper_loss=0.08531, over 16866.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01117, ecapa_loss=0.0001951, whisper_loss=0.09311, over 3916997.98 frames. ], batch size: 71, lr: 7.13e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:58:22,795 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 16:58:23,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1196380.0, ans=0.125 2024-08-11 16:58:41,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1196480.0, ans=0.125 2024-08-11 16:58:47,113 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 16:58:50,854 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 18 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 16:58:57,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1196580.0, ans=0.1 2024-08-11 16:59:02,824 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 16:59:13,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1196780.0, ans=0.125 2024-08-11 16:59:17,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1196780.0, ans=0.125 2024-08-11 16:59:27,701 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3750, loss[loss=0.1084, beats_loss=0.01065, ecapa_loss=0.0001876, whisper_loss=0.09585, over 14588.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01124, ecapa_loss=0.0001942, whisper_loss=0.09218, over 3877786.30 frames. ], batch size: 56, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 16:59:37,193 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 16:59:37,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1196880.0, ans=0.2 2024-08-11 16:59:43,711 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-11 16:59:50,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.626e+01 2.806e+01 3.237e+01 4.971e+01, threshold=5.612e+01, percent-clipped=0.0 2024-08-11 17:00:05,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1197080.0, ans=0.025 2024-08-11 17:00:34,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3800, loss[loss=0.1061, beats_loss=0.01211, ecapa_loss=0.0001941, whisper_loss=0.09209, over 19218.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01131, ecapa_loss=0.0001938, whisper_loss=0.09207, over 3890924.33 frames. ], batch size: 76, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:00:42,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1197380.0, ans=0.2 2024-08-11 17:00:46,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1197480.0, ans=0.0 2024-08-11 17:01:00,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1197580.0, ans=0.125 2024-08-11 17:01:01,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1197580.0, ans=0.1 2024-08-11 17:01:03,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=25.51 vs. limit=22.5 2024-08-11 17:01:13,061 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 17:01:22,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1197680.0, ans=0.07 2024-08-11 17:01:23,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1197680.0, ans=0.2 2024-08-11 17:01:34,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1197780.0, ans=0.125 2024-08-11 17:01:40,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3850, loss[loss=0.1154, beats_loss=0.01247, ecapa_loss=0.000162, whisper_loss=0.1013, over 21755.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01135, ecapa_loss=0.0001935, whisper_loss=0.09207, over 3884470.62 frames. ], batch size: 84, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:01:46,603 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 17:01:53,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=12.0 2024-08-11 17:01:56,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1197980.0, ans=0.5 2024-08-11 17:01:56,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-08-11 17:02:03,659 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+01 2.720e+01 3.010e+01 3.419e+01 7.200e+01, threshold=6.020e+01, percent-clipped=2.0 2024-08-11 17:02:23,577 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 17:02:25,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1198180.0, ans=0.025 2024-08-11 17:02:26,740 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 17:02:37,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1198280.0, ans=0.125 2024-08-11 17:02:42,334 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 17:02:47,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3900, loss[loss=0.1063, beats_loss=0.01322, ecapa_loss=0.0001721, whisper_loss=0.09135, over 23333.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01128, ecapa_loss=0.000194, whisper_loss=0.09261, over 3899595.07 frames. ], batch size: 93, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:02:49,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1198380.0, ans=0.5 2024-08-11 17:03:30,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1198680.0, ans=0.125 2024-08-11 17:03:39,347 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-11 17:03:45,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-11 17:03:47,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1198780.0, ans=0.125 2024-08-11 17:03:52,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1198880.0, ans=0.0 2024-08-11 17:03:53,671 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 3950, loss[loss=0.0977, beats_loss=0.01266, ecapa_loss=0.0001631, whisper_loss=0.0834, over 23057.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01123, ecapa_loss=0.0001953, whisper_loss=0.09296, over 3908464.71 frames. ], batch size: 90, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:03:58,931 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 17:04:01,407 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 17:04:03,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-11 17:04:07,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1198980.0, ans=0.125 2024-08-11 17:04:08,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1198980.0, ans=0.0 2024-08-11 17:04:09,469 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-11 17:04:15,579 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+01 2.737e+01 3.009e+01 3.546e+01 1.155e+02, threshold=6.019e+01, percent-clipped=1.0 2024-08-11 17:04:15,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1198980.0, ans=0.125 2024-08-11 17:04:40,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=12.0 2024-08-11 17:04:48,821 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:04:52,422 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 17:04:58,714 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.014e+05 2024-08-11 17:05:00,932 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4000, loss[loss=0.114, beats_loss=0.009108, ecapa_loss=0.0002263, whisper_loss=0.1026, over 22290.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01133, ecapa_loss=0.0001958, whisper_loss=0.09285, over 3925105.99 frames. ], batch size: 90, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:05:01,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1199380.0, ans=0.125 2024-08-11 17:05:16,957 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 17:05:19,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1199480.0, ans=0.2 2024-08-11 17:05:21,388 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-11 17:05:31,347 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 17:05:33,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1199580.0, ans=0.125 2024-08-11 17:05:37,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2024-08-11 17:05:42,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1199680.0, ans=0.0 2024-08-11 17:05:46,574 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=3.593e-02 2024-08-11 17:05:53,277 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 17:05:54,577 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 17:05:55,898 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 17:06:04,914 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 17:06:05,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1199780.0, ans=0.0 2024-08-11 17:06:05,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1199780.0, ans=0.0 2024-08-11 17:06:11,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4050, loss[loss=0.1196, beats_loss=0.009788, ecapa_loss=0.000245, whisper_loss=0.1074, over 22613.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01132, ecapa_loss=0.0001967, whisper_loss=0.09267, over 3899351.43 frames. ], batch size: 94, lr: 7.12e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:06:15,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1199880.0, ans=0.125 2024-08-11 17:06:15,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1199880.0, ans=0.125 2024-08-11 17:06:26,569 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-120000.pt 2024-08-11 17:06:30,672 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 17:06:33,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1199980.0, ans=0.1 2024-08-11 17:06:37,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.884e+01 3.098e+01 3.625e+01 5.878e+01, threshold=6.196e+01, percent-clipped=0.0 2024-08-11 17:06:38,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2024-08-11 17:06:44,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1200080.0, ans=0.0 2024-08-11 17:06:47,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-08-11 17:06:55,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1200180.0, ans=0.125 2024-08-11 17:07:03,845 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 17:07:10,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1200280.0, ans=0.125 2024-08-11 17:07:23,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4100, loss[loss=0.1093, beats_loss=0.01195, ecapa_loss=0.0002457, whisper_loss=0.09492, over 19239.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01135, ecapa_loss=0.0001969, whisper_loss=0.09299, over 3899141.87 frames. ], batch size: 82, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:07:35,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1200480.0, ans=0.1 2024-08-11 17:07:37,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1200480.0, ans=0.125 2024-08-11 17:07:42,870 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 17:08:00,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2024-08-11 17:08:05,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1200680.0, ans=0.2 2024-08-11 17:08:07,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1200680.0, ans=0.1 2024-08-11 17:08:17,665 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 17:08:22,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2024-08-11 17:08:32,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4150, loss[loss=0.09349, beats_loss=0.01316, ecapa_loss=0.0001979, whisper_loss=0.07835, over 21406.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01138, ecapa_loss=0.0001967, whisper_loss=0.09303, over 3913369.62 frames. ], batch size: 86, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:08:40,006 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-11 17:08:47,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1200980.0, ans=0.0 2024-08-11 17:08:52,812 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 17:08:55,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.681e+01 3.148e+01 3.707e+01 5.413e+01, threshold=6.297e+01, percent-clipped=0.0 2024-08-11 17:08:57,658 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:09:22,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1201180.0, ans=0.125 2024-08-11 17:09:42,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4200, loss[loss=0.1039, beats_loss=0.01106, ecapa_loss=0.0001382, whisper_loss=0.09142, over 17963.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01131, ecapa_loss=0.0001966, whisper_loss=0.09384, over 3914285.02 frames. ], batch size: 67, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:09:52,976 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 17:09:54,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-11 17:10:28,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2024-08-11 17:10:41,780 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 17:10:51,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1201880.0, ans=0.0 2024-08-11 17:10:52,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4250, loss[loss=0.09049, beats_loss=0.01604, ecapa_loss=0.0001683, whisper_loss=0.07276, over 16463.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01127, ecapa_loss=0.0001965, whisper_loss=0.09354, over 3917018.79 frames. ], batch size: 67, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:11:03,656 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 17:11:16,331 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.599e+01 2.986e+01 3.415e+01 8.403e+01, threshold=5.972e+01, percent-clipped=2.0 2024-08-11 17:11:33,012 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 17:11:43,510 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 17:12:01,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4300, loss[loss=0.1222, beats_loss=0.012, ecapa_loss=0.000185, whisper_loss=0.1084, over 23060.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01127, ecapa_loss=0.0001957, whisper_loss=0.09325, over 3916084.86 frames. ], batch size: 93, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:12:27,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1202580.0, ans=0.0 2024-08-11 17:12:32,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=6.0 2024-08-11 17:12:37,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1202580.0, ans=0.125 2024-08-11 17:12:55,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1202780.0, ans=0.0 2024-08-11 17:13:00,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1202780.0, ans=0.125 2024-08-11 17:13:00,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1202780.0, ans=0.0 2024-08-11 17:13:01,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1202780.0, ans=0.125 2024-08-11 17:13:04,680 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 17:13:07,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1202780.0, ans=0.0 2024-08-11 17:13:11,054 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4350, loss[loss=0.1115, beats_loss=0.00949, ecapa_loss=0.0002339, whisper_loss=0.09971, over 14074.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01113, ecapa_loss=0.0001978, whisper_loss=0.09332, over 3888128.95 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:13:23,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1202880.0, ans=0.95 2024-08-11 17:13:30,380 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-11 17:13:31,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1202980.0, ans=0.125 2024-08-11 17:13:34,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1202980.0, ans=0.2 2024-08-11 17:13:35,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.556e+01 3.068e+01 3.501e+01 5.955e+01, threshold=6.137e+01, percent-clipped=0.0 2024-08-11 17:13:39,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1203080.0, ans=0.125 2024-08-11 17:13:42,808 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-11 17:13:45,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1203080.0, ans=0.125 2024-08-11 17:13:47,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-11 17:14:00,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2024-08-11 17:14:02,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1203180.0, ans=0.1 2024-08-11 17:14:15,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2024-08-11 17:14:21,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4400, loss[loss=0.1011, beats_loss=0.01162, ecapa_loss=0.00014, whisper_loss=0.08809, over 20054.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01118, ecapa_loss=0.0001955, whisper_loss=0.09339, over 3901323.34 frames. ], batch size: 76, lr: 7.11e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:14:36,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1203480.0, ans=0.125 2024-08-11 17:14:41,931 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 17:14:46,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1203480.0, ans=0.125 2024-08-11 17:14:48,723 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 17:14:52,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1203580.0, ans=0.0 2024-08-11 17:15:34,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4450, loss[loss=0.07143, beats_loss=0.009486, ecapa_loss=0.0001469, whisper_loss=0.06047, over 16989.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01116, ecapa_loss=0.0001949, whisper_loss=0.0932, over 3911856.85 frames. ], batch size: 63, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:15:34,449 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 17:15:41,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1203880.0, ans=0.125 2024-08-11 17:15:54,563 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 17:15:59,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1203980.0, ans=0.125 2024-08-11 17:16:01,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2024-08-11 17:16:02,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.738e+01 3.141e+01 3.648e+01 6.257e+01, threshold=6.281e+01, percent-clipped=1.0 2024-08-11 17:16:02,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1203980.0, ans=0.125 2024-08-11 17:16:14,985 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 17:16:16,588 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.172e+02 2024-08-11 17:16:20,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1204180.0, ans=0.125 2024-08-11 17:16:28,076 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 17:16:37,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1204280.0, ans=0.2 2024-08-11 17:16:39,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2024-08-11 17:16:45,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1204280.0, ans=0.125 2024-08-11 17:16:54,225 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4500, loss[loss=0.09454, beats_loss=0.01361, ecapa_loss=0.0002293, whisper_loss=0.07863, over 19357.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01126, ecapa_loss=0.0001951, whisper_loss=0.09319, over 3919380.87 frames. ], batch size: 86, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:16:59,216 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 17:17:07,593 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.518e-03 2024-08-11 17:17:40,912 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-11 17:18:07,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1204780.0, ans=0.125 2024-08-11 17:18:10,985 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 37 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 17:18:17,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4550, loss[loss=0.08639, beats_loss=0.0119, ecapa_loss=0.0001736, whisper_loss=0.07276, over 18327.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01125, ecapa_loss=0.0001946, whisper_loss=0.09289, over 3939220.84 frames. ], batch size: 71, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:18:35,529 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 17:18:39,436 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 17:18:43,444 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 17:18:44,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.743e+01 3.155e+01 3.839e+01 5.758e+01, threshold=6.310e+01, percent-clipped=0.0 2024-08-11 17:19:24,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-11 17:19:24,978 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-11 17:19:25,366 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.061e-02 2024-08-11 17:19:31,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1205280.0, ans=0.1 2024-08-11 17:19:34,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4600, loss[loss=0.1109, beats_loss=0.01132, ecapa_loss=0.0002088, whisper_loss=0.09753, over 21269.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001954, whisper_loss=0.09265, over 3899333.32 frames. ], batch size: 90, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:19:37,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1205380.0, ans=0.125 2024-08-11 17:19:38,648 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 17:19:52,759 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 17:20:04,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1205580.0, ans=0.125 2024-08-11 17:20:25,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1205680.0, ans=0.2 2024-08-11 17:20:43,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1205780.0, ans=0.125 2024-08-11 17:20:54,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4650, loss[loss=0.09658, beats_loss=0.01181, ecapa_loss=0.0001609, whisper_loss=0.08316, over 16412.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0112, ecapa_loss=0.0001965, whisper_loss=0.09274, over 3885468.71 frames. ], batch size: 62, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:21:05,488 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 17:21:08,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1205880.0, ans=0.1 2024-08-11 17:21:23,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.186e+01 2.651e+01 2.897e+01 3.330e+01 4.454e+01, threshold=5.794e+01, percent-clipped=0.0 2024-08-11 17:21:29,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1206080.0, ans=0.0 2024-08-11 17:21:39,042 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 17:21:47,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1206180.0, ans=0.0 2024-08-11 17:21:50,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1206180.0, ans=0.125 2024-08-11 17:21:57,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1206180.0, ans=0.125 2024-08-11 17:22:02,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1206280.0, ans=0.0 2024-08-11 17:22:05,554 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:22:09,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2024-08-11 17:22:10,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1206280.0, ans=0.125 2024-08-11 17:22:11,609 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 17:22:12,749 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4700, loss[loss=0.09609, beats_loss=0.01214, ecapa_loss=0.0001796, whisper_loss=0.08215, over 15918.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01134, ecapa_loss=0.0001938, whisper_loss=0.09208, over 3915795.31 frames. ], batch size: 64, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:22:16,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1206380.0, ans=0.1 2024-08-11 17:22:21,007 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 17:22:21,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1206380.0, ans=0.0 2024-08-11 17:22:46,471 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 17:22:55,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1206680.0, ans=0.1 2024-08-11 17:23:06,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1206780.0, ans=0.0 2024-08-11 17:23:19,408 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4750, loss[loss=0.0965, beats_loss=0.00861, ecapa_loss=0.00025, whisper_loss=0.08539, over 16288.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01126, ecapa_loss=0.0001952, whisper_loss=0.09226, over 3928852.73 frames. ], batch size: 66, lr: 7.10e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:23:19,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1206880.0, ans=0.0 2024-08-11 17:23:33,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2024-08-11 17:23:34,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1206980.0, ans=0.125 2024-08-11 17:23:42,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.773e+01 3.300e+01 3.701e+01 2.356e+02, threshold=6.600e+01, percent-clipped=1.0 2024-08-11 17:23:42,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1206980.0, ans=0.2 2024-08-11 17:23:48,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-11 17:23:55,827 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 17:24:09,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.92 vs. limit=10.0 2024-08-11 17:24:12,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1207280.0, ans=0.1 2024-08-11 17:24:19,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1207280.0, ans=0.125 2024-08-11 17:24:26,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4800, loss[loss=0.1127, beats_loss=0.01299, ecapa_loss=0.0001398, whisper_loss=0.09833, over 22155.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0113, ecapa_loss=0.0001953, whisper_loss=0.09247, over 3934046.29 frames. ], batch size: 87, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:24:26,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1207380.0, ans=0.125 2024-08-11 17:24:34,060 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-11 17:24:34,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-11 17:24:39,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1207480.0, ans=0.1 2024-08-11 17:24:45,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1207480.0, ans=10.0 2024-08-11 17:24:47,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1207480.0, ans=0.09899494936611666 2024-08-11 17:25:05,100 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-11 17:25:17,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1207680.0, ans=0.125 2024-08-11 17:25:32,947 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4850, loss[loss=0.1095, beats_loss=0.01141, ecapa_loss=0.0001873, whisper_loss=0.09625, over 23136.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01131, ecapa_loss=0.0001935, whisper_loss=0.09219, over 3924026.15 frames. ], batch size: 96, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:25:37,330 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-11 17:25:38,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1207880.0, ans=0.04949747468305833 2024-08-11 17:25:38,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1207880.0, ans=0.125 2024-08-11 17:25:43,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=12.0 2024-08-11 17:25:49,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1207980.0, ans=0.125 2024-08-11 17:25:53,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1207980.0, ans=0.0 2024-08-11 17:25:55,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.586e+01 2.829e+01 3.279e+01 4.850e+01, threshold=5.658e+01, percent-clipped=0.0 2024-08-11 17:25:55,908 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 17:26:10,732 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 17:26:13,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1208180.0, ans=0.125 2024-08-11 17:26:26,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1208280.0, ans=0.125 2024-08-11 17:26:30,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-08-11 17:26:31,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1208280.0, ans=0.125 2024-08-11 17:26:39,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4900, loss[loss=0.1099, beats_loss=0.01161, ecapa_loss=0.0001818, whisper_loss=0.0965, over 19921.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0113, ecapa_loss=0.000194, whisper_loss=0.09284, over 3921628.03 frames. ], batch size: 79, lr: 7.09e-03, grad_scale: 5.764607523034235e+17 2024-08-11 17:26:47,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2024-08-11 17:26:49,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1208380.0, ans=0.125 2024-08-11 17:26:59,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1208480.0, ans=0.0 2024-08-11 17:27:07,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1208580.0, ans=0.0 2024-08-11 17:27:32,738 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.883e+00 2024-08-11 17:27:50,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 4950, loss[loss=0.09944, beats_loss=0.01257, ecapa_loss=0.0002787, whisper_loss=0.08409, over 20406.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0113, ecapa_loss=0.0001948, whisper_loss=0.09297, over 3914446.09 frames. ], batch size: 89, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:27:53,779 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 17:27:55,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-08-11 17:27:56,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1208880.0, ans=0.125 2024-08-11 17:28:15,176 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.567e+01 2.832e+01 3.214e+01 4.886e+01, threshold=5.664e+01, percent-clipped=0.0 2024-08-11 17:28:21,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1209080.0, ans=0.1 2024-08-11 17:28:31,836 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-11 17:28:36,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1209180.0, ans=0.0 2024-08-11 17:28:39,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1209180.0, ans=0.0 2024-08-11 17:29:03,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.15 vs. limit=10.0 2024-08-11 17:29:04,900 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5000, loss[loss=0.09648, beats_loss=0.009409, ecapa_loss=0.0001898, whisper_loss=0.08517, over 17074.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01131, ecapa_loss=0.0001949, whisper_loss=0.09291, over 3869995.62 frames. ], batch size: 67, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:29:08,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1209380.0, ans=0.1 2024-08-11 17:29:14,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.11 vs. limit=22.5 2024-08-11 17:29:19,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1209480.0, ans=0.125 2024-08-11 17:29:37,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-11 17:29:46,134 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-11 17:29:54,939 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 17:29:55,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1209680.0, ans=0.125 2024-08-11 17:29:57,006 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-11 17:30:12,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1209780.0, ans=0.0 2024-08-11 17:30:18,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1209880.0, ans=0.125 2024-08-11 17:30:19,235 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5050, loss[loss=0.1078, beats_loss=0.01181, ecapa_loss=0.000166, whisper_loss=0.09429, over 23088.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0113, ecapa_loss=0.0001942, whisper_loss=0.09308, over 3874210.58 frames. ], batch size: 93, lr: 7.09e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:30:37,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1209980.0, ans=0.0 2024-08-11 17:30:44,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.569e+01 2.847e+01 3.482e+01 7.100e+01, threshold=5.695e+01, percent-clipped=3.0 2024-08-11 17:30:50,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1210080.0, ans=0.125 2024-08-11 17:30:54,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1210080.0, ans=0.2 2024-08-11 17:31:08,085 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 17:31:13,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1210180.0, ans=0.0 2024-08-11 17:31:14,369 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-11 17:31:21,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1210280.0, ans=0.125 2024-08-11 17:31:23,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1210280.0, ans=0.0 2024-08-11 17:31:27,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1210280.0, ans=0.125 2024-08-11 17:31:33,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1210380.0, ans=0.0 2024-08-11 17:31:35,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5100, loss[loss=0.09633, beats_loss=0.01161, ecapa_loss=0.0002482, whisper_loss=0.08224, over 20732.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01122, ecapa_loss=0.0001941, whisper_loss=0.09432, over 3922328.61 frames. ], batch size: 89, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:32:09,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-08-11 17:32:11,912 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-11 17:32:12,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1210580.0, ans=0.025 2024-08-11 17:32:12,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1210580.0, ans=0.125 2024-08-11 17:32:14,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1210580.0, ans=0.125 2024-08-11 17:32:23,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1210680.0, ans=0.0 2024-08-11 17:32:46,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1210780.0, ans=0.035 2024-08-11 17:32:55,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5150, loss[loss=0.08416, beats_loss=0.01442, ecapa_loss=0.0001995, whisper_loss=0.06774, over 20338.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01126, ecapa_loss=0.0001924, whisper_loss=0.09397, over 3905732.31 frames. ], batch size: 90, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:32:55,240 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 17:33:12,923 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-11 17:33:22,095 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.614e+01 3.081e+01 3.730e+01 5.554e+01, threshold=6.161e+01, percent-clipped=0.0 2024-08-11 17:33:44,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1211180.0, ans=0.09899494936611666 2024-08-11 17:34:11,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5200, loss[loss=0.1076, beats_loss=0.01194, ecapa_loss=0.0001792, whisper_loss=0.09383, over 19571.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01128, ecapa_loss=0.0001915, whisper_loss=0.09391, over 3889781.88 frames. ], batch size: 77, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:34:21,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1211380.0, ans=0.1 2024-08-11 17:34:24,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1211380.0, ans=0.0 2024-08-11 17:34:24,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=12.0 2024-08-11 17:34:43,349 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 17:34:51,539 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 17:34:51,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1211580.0, ans=0.125 2024-08-11 17:35:09,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1211680.0, ans=0.0 2024-08-11 17:35:15,674 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:35:15,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1211780.0, ans=0.125 2024-08-11 17:35:29,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5250, loss[loss=0.119, beats_loss=0.01102, ecapa_loss=0.0001691, whisper_loss=0.1063, over 20481.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.0112, ecapa_loss=0.0001911, whisper_loss=0.09384, over 3868246.14 frames. ], batch size: 78, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:35:36,067 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-11 17:35:36,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1211880.0, ans=0.0 2024-08-11 17:35:39,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1211880.0, ans=0.125 2024-08-11 17:35:45,580 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-11 17:35:57,401 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+01 2.672e+01 3.061e+01 3.448e+01 6.321e+01, threshold=6.122e+01, percent-clipped=2.0 2024-08-11 17:35:59,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1211980.0, ans=0.125 2024-08-11 17:36:02,241 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 17:36:21,709 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-11 17:36:29,538 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 17:36:37,309 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 17:36:39,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1212280.0, ans=0.1 2024-08-11 17:36:48,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5300, loss[loss=0.1085, beats_loss=0.01068, ecapa_loss=0.000185, whisper_loss=0.09594, over 18228.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01117, ecapa_loss=0.0001916, whisper_loss=0.0937, over 3868132.85 frames. ], batch size: 71, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:36:56,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1212380.0, ans=0.0 2024-08-11 17:37:04,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1212480.0, ans=0.0 2024-08-11 17:37:17,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1212580.0, ans=0.0 2024-08-11 17:37:25,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1212580.0, ans=0.0 2024-08-11 17:37:25,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2024-08-11 17:37:30,408 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-11 17:37:30,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1212580.0, ans=0.95 2024-08-11 17:37:37,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-08-11 17:37:49,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-11 17:37:50,438 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-11 17:37:51,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-08-11 17:38:00,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=1212780.0, ans=12.0 2024-08-11 17:38:04,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5350, loss[loss=0.08208, beats_loss=0.01255, ecapa_loss=0.0001651, whisper_loss=0.06788, over 16156.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01113, ecapa_loss=0.0001916, whisper_loss=0.09329, over 3848319.39 frames. ], batch size: 64, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:38:29,924 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.524e+01 2.904e+01 3.271e+01 6.276e+01, threshold=5.808e+01, percent-clipped=1.0 2024-08-11 17:38:39,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1213080.0, ans=0.0 2024-08-11 17:38:57,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1213180.0, ans=0.125 2024-08-11 17:39:10,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-08-11 17:39:25,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5400, loss[loss=0.1034, beats_loss=0.01193, ecapa_loss=0.0001812, whisper_loss=0.08968, over 20698.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01105, ecapa_loss=0.0001923, whisper_loss=0.09352, over 3839383.92 frames. ], batch size: 83, lr: 7.08e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:39:27,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1213380.0, ans=0.125 2024-08-11 17:39:34,306 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-11 17:39:49,079 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.179e-01 2024-08-11 17:39:55,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1213480.0, ans=0.0 2024-08-11 17:39:57,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1213580.0, ans=0.125 2024-08-11 17:40:06,622 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:40:26,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-08-11 17:40:44,021 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5450, loss[loss=0.1124, beats_loss=0.01088, ecapa_loss=0.0001804, whisper_loss=0.09975, over 22312.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01113, ecapa_loss=0.0001918, whisper_loss=0.09356, over 3865938.44 frames. ], batch size: 89, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:40:53,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-08-11 17:41:04,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1213980.0, ans=0.035 2024-08-11 17:41:11,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+01 2.638e+01 2.966e+01 3.379e+01 5.199e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 17:41:14,685 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 17:41:35,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-11 17:41:45,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1214280.0, ans=0.04949747468305833 2024-08-11 17:41:47,508 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 17:42:02,269 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-11 17:42:03,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5500, loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001897, whisper_loss=0.08964, over 16433.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0111, ecapa_loss=0.0001923, whisper_loss=0.09308, over 3854459.68 frames. ], batch size: 64, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:42:09,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1214380.0, ans=0.2 2024-08-11 17:42:11,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-08-11 17:42:15,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1214380.0, ans=0.2 2024-08-11 17:43:00,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1214680.0, ans=0.125 2024-08-11 17:43:02,755 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 35 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 17:43:03,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2024-08-11 17:43:18,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1214780.0, ans=0.125 2024-08-11 17:43:25,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5550, loss[loss=0.1194, beats_loss=0.01086, ecapa_loss=0.0001804, whisper_loss=0.1067, over 21930.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01111, ecapa_loss=0.0001943, whisper_loss=0.09368, over 3900650.27 frames. ], batch size: 88, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:43:25,532 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 17:43:35,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1214880.0, ans=0.0 2024-08-11 17:43:49,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1214980.0, ans=0.07 2024-08-11 17:43:53,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.662e+01 2.905e+01 3.480e+01 6.680e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-11 17:44:07,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1215080.0, ans=0.2 2024-08-11 17:44:21,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1215180.0, ans=0.125 2024-08-11 17:44:23,071 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 17:44:37,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1215280.0, ans=0.0 2024-08-11 17:44:38,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1215280.0, ans=0.1 2024-08-11 17:44:46,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5600, loss[loss=0.103, beats_loss=0.009331, ecapa_loss=0.0001567, whisper_loss=0.09215, over 17017.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01113, ecapa_loss=0.0001942, whisper_loss=0.09367, over 3897516.45 frames. ], batch size: 62, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:44:52,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1215380.0, ans=0.0 2024-08-11 17:44:58,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1215380.0, ans=0.125 2024-08-11 17:45:13,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1215480.0, ans=0.125 2024-08-11 17:45:27,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1215580.0, ans=15.0 2024-08-11 17:45:42,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1215680.0, ans=0.0 2024-08-11 17:45:59,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1215780.0, ans=0.125 2024-08-11 17:46:04,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1215880.0, ans=0.5 2024-08-11 17:46:05,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5650, loss[loss=0.1113, beats_loss=0.009978, ecapa_loss=0.0002077, whisper_loss=0.09923, over 21845.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01119, ecapa_loss=0.0001933, whisper_loss=0.09313, over 3919945.44 frames. ], batch size: 90, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:46:31,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+01 2.596e+01 3.008e+01 3.518e+01 5.757e+01, threshold=6.016e+01, percent-clipped=0.0 2024-08-11 17:46:35,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-11 17:46:46,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1216080.0, ans=0.2 2024-08-11 17:46:58,506 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 39 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 17:47:08,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2024-08-11 17:47:22,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5700, loss[loss=0.117, beats_loss=0.01118, ecapa_loss=0.000138, whisper_loss=0.1045, over 21548.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01122, ecapa_loss=0.0001942, whisper_loss=0.09282, over 3900980.17 frames. ], batch size: 81, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:47:31,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-11 17:47:37,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1216380.0, ans=0.0 2024-08-11 17:47:47,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1216480.0, ans=0.0 2024-08-11 17:48:18,965 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 17:48:19,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-08-11 17:48:42,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5750, loss[loss=0.0997, beats_loss=0.01008, ecapa_loss=0.0002051, whisper_loss=0.08757, over 22007.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01119, ecapa_loss=0.0001942, whisper_loss=0.09312, over 3885463.92 frames. ], batch size: 87, lr: 7.07e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:48:54,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2024-08-11 17:49:03,455 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 17:49:08,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.576e+01 2.968e+01 3.290e+01 6.597e+01, threshold=5.936e+01, percent-clipped=1.0 2024-08-11 17:49:09,181 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-11 17:49:18,885 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 17:49:20,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1217080.0, ans=0.1 2024-08-11 17:49:29,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1217180.0, ans=0.0 2024-08-11 17:49:41,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-11 17:49:50,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1217280.0, ans=0.125 2024-08-11 17:49:53,002 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 17:50:00,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5800, loss[loss=0.08389, beats_loss=0.01102, ecapa_loss=0.0001898, whisper_loss=0.07098, over 17018.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0112, ecapa_loss=0.0001939, whisper_loss=0.0927, over 3855382.82 frames. ], batch size: 67, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:50:07,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1217380.0, ans=0.0 2024-08-11 17:50:28,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1217480.0, ans=0.125 2024-08-11 17:50:33,573 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-11 17:50:39,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1217580.0, ans=0.2 2024-08-11 17:50:53,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.63 vs. limit=15.0 2024-08-11 17:51:05,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1217780.0, ans=0.0 2024-08-11 17:51:15,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5850, loss[loss=0.1175, beats_loss=0.009753, ecapa_loss=0.0002614, whisper_loss=0.1052, over 17425.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001933, whisper_loss=0.09262, over 3866543.90 frames. ], batch size: 73, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:51:23,681 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-11 17:51:35,982 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-11 17:51:36,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2024-08-11 17:51:39,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.535e+01 2.906e+01 3.221e+01 4.693e+01, threshold=5.811e+01, percent-clipped=0.0 2024-08-11 17:51:44,744 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 17:52:04,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1218180.0, ans=0.125 2024-08-11 17:52:20,891 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-11 17:52:23,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1218280.0, ans=0.125 2024-08-11 17:52:29,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5900, loss[loss=0.1078, beats_loss=0.01241, ecapa_loss=0.000176, whisper_loss=0.09358, over 22535.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0112, ecapa_loss=0.0001935, whisper_loss=0.09314, over 3858167.60 frames. ], batch size: 91, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:52:31,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=22.5 2024-08-11 17:52:36,886 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 17:52:44,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-08-11 17:52:45,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-11 17:52:55,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1218480.0, ans=0.1 2024-08-11 17:52:59,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1218580.0, ans=0.125 2024-08-11 17:53:01,564 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 17:53:11,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1218580.0, ans=0.125 2024-08-11 17:53:33,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1218780.0, ans=0.125 2024-08-11 17:53:47,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 5950, loss[loss=0.114, beats_loss=0.01041, ecapa_loss=0.0001591, whisper_loss=0.102, over 17726.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01133, ecapa_loss=0.0001929, whisper_loss=0.09225, over 3867762.14 frames. ], batch size: 68, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:53:59,417 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 17:53:59,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1218880.0, ans=0.0 2024-08-11 17:54:13,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.543e+01 2.844e+01 3.292e+01 4.976e+01, threshold=5.688e+01, percent-clipped=0.0 2024-08-11 17:54:19,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1219080.0, ans=0.125 2024-08-11 17:55:03,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6000, loss[loss=0.1116, beats_loss=0.01168, ecapa_loss=0.0001691, whisper_loss=0.09824, over 23563.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01134, ecapa_loss=0.0001911, whisper_loss=0.09258, over 3860386.37 frames. ], batch size: 91, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:55:03,133 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 17:55:39,284 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0006361, whisper_loss=0.2509, over 922467.00 frames. 2024-08-11 17:55:57,646 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on SV_voxceleb1: loss=0.005086, beats_loss=0, ecapa_loss=0.0005086, whisper_loss=0, over 939242.00 frames. 2024-08-11 17:57:42,062 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on AT_audioset: loss=0.02513, beats_loss=0.02513, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 17:57:42,067 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 17:57:42,228 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 17:57:58,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-08-11 17:57:59,461 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 17:58:33,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1219680.0, ans=0.125 2024-08-11 17:58:34,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1219680.0, ans=0.0 2024-08-11 17:58:41,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=12.0 2024-08-11 17:58:42,326 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 17:59:06,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6050, loss[loss=0.1303, beats_loss=0.009044, ecapa_loss=0.0001915, whisper_loss=0.1193, over 20550.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01138, ecapa_loss=0.0001915, whisper_loss=0.09185, over 3828659.02 frames. ], batch size: 77, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 17:59:07,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1219880.0, ans=0.125 2024-08-11 17:59:15,811 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 17:59:17,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1219880.0, ans=0.125 2024-08-11 17:59:34,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.570e+01 2.877e+01 3.382e+01 4.916e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-11 17:59:34,773 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 17:59:48,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1220080.0, ans=0.025 2024-08-11 18:00:09,438 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 18:00:26,712 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 18:00:29,160 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6100, loss[loss=0.1206, beats_loss=0.008652, ecapa_loss=0.000229, whisper_loss=0.1097, over 21368.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01137, ecapa_loss=0.0001926, whisper_loss=0.09131, over 3816474.32 frames. ], batch size: 85, lr: 7.06e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:00:49,922 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 18:00:55,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2024-08-11 18:00:59,715 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 39 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 18:01:05,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1220580.0, ans=0.04949747468305833 2024-08-11 18:01:11,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1220580.0, ans=0.0 2024-08-11 18:01:44,596 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-11 18:01:52,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6150, loss[loss=0.116, beats_loss=0.01142, ecapa_loss=0.0001665, whisper_loss=0.1029, over 24116.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01132, ecapa_loss=0.0001943, whisper_loss=0.09208, over 3847395.41 frames. ], batch size: 93, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:01:59,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1220880.0, ans=0.2 2024-08-11 18:02:02,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.73 vs. limit=15.0 2024-08-11 18:02:03,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1220880.0, ans=0.0 2024-08-11 18:02:12,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1220980.0, ans=0.125 2024-08-11 18:02:17,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1220980.0, ans=0.125 2024-08-11 18:02:20,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+01 2.665e+01 2.922e+01 3.415e+01 6.689e+01, threshold=5.844e+01, percent-clipped=1.0 2024-08-11 18:02:20,477 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-11 18:02:20,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1220980.0, ans=0.125 2024-08-11 18:02:33,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1221080.0, ans=0.07 2024-08-11 18:02:38,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=12.0 2024-08-11 18:02:51,767 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 18:03:04,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1221280.0, ans=0.2 2024-08-11 18:03:06,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2024-08-11 18:03:11,548 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6200, loss[loss=0.1146, beats_loss=0.01068, ecapa_loss=0.0002398, whisper_loss=0.1015, over 18849.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01136, ecapa_loss=0.0001936, whisper_loss=0.09191, over 3852781.66 frames. ], batch size: 79, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:03:25,148 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 18:03:50,214 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 33 from Vox, 23 fro AS 2024-08-11 18:03:50,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2024-08-11 18:03:53,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1221580.0, ans=0.1 2024-08-11 18:03:53,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.76 vs. limit=10.0 2024-08-11 18:03:59,159 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-11 18:04:13,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1221680.0, ans=0.0 2024-08-11 18:04:14,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1221780.0, ans=0.125 2024-08-11 18:04:31,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6250, loss[loss=0.06594, beats_loss=0.01299, ecapa_loss=0.0002269, whisper_loss=0.05068, over 13829.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0113, ecapa_loss=0.000194, whisper_loss=0.09222, over 3869848.57 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:04:38,416 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 18:04:45,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2024-08-11 18:04:58,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+01 2.590e+01 2.864e+01 3.315e+01 6.460e+01, threshold=5.728e+01, percent-clipped=1.0 2024-08-11 18:04:59,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1221980.0, ans=0.125 2024-08-11 18:05:02,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1222080.0, ans=0.125 2024-08-11 18:05:09,080 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 18:05:09,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.99 vs. limit=22.5 2024-08-11 18:05:12,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1222080.0, ans=0.125 2024-08-11 18:05:27,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1222180.0, ans=0.125 2024-08-11 18:05:33,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=24.42 vs. limit=15.0 2024-08-11 18:05:52,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6300, loss[loss=0.0923, beats_loss=0.01125, ecapa_loss=0.0001664, whisper_loss=0.07939, over 15068.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01127, ecapa_loss=0.0001929, whisper_loss=0.09235, over 3859621.11 frames. ], batch size: 59, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:05:54,523 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 18:07:02,473 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-11 18:07:21,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-11 18:07:29,871 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 18:07:34,824 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 18:07:41,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1222780.0, ans=0.125 2024-08-11 18:07:46,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6350, loss[loss=0.09109, beats_loss=0.01135, ecapa_loss=0.0001904, whisper_loss=0.07784, over 15028.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01116, ecapa_loss=0.0001938, whisper_loss=0.09335, over 3836442.24 frames. ], batch size: 60, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:07:51,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1222880.0, ans=0.04949747468305833 2024-08-11 18:08:17,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.576e+01 2.972e+01 3.431e+01 4.977e+01, threshold=5.945e+01, percent-clipped=0.0 2024-08-11 18:08:21,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-11 18:08:45,360 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:08:45,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2024-08-11 18:09:03,764 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-11 18:09:06,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1223280.0, ans=0.125 2024-08-11 18:09:27,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1223280.0, ans=0.07 2024-08-11 18:09:28,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-08-11 18:09:32,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6400, loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0002312, whisper_loss=0.08834, over 18992.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01111, ecapa_loss=0.0001936, whisper_loss=0.09381, over 3840876.75 frames. ], batch size: 78, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:09:36,236 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-11 18:09:54,369 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 18:09:54,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-08-11 18:10:02,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1223480.0, ans=0.04949747468305833 2024-08-11 18:10:03,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1223480.0, ans=0.125 2024-08-11 18:10:12,154 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 18:10:25,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1223580.0, ans=0.125 2024-08-11 18:10:30,538 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 18:11:07,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2024-08-11 18:11:23,695 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6450, loss[loss=0.08295, beats_loss=0.01325, ecapa_loss=0.0001851, whisper_loss=0.06785, over 18157.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01114, ecapa_loss=0.0001937, whisper_loss=0.09344, over 3873117.09 frames. ], batch size: 77, lr: 7.05e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:11:24,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.23 vs. limit=22.5 2024-08-11 18:11:44,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1223880.0, ans=0.0 2024-08-11 18:11:47,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1223980.0, ans=0.1 2024-08-11 18:12:08,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.662e+01 3.047e+01 3.508e+01 5.395e+01, threshold=6.093e+01, percent-clipped=0.0 2024-08-11 18:12:39,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1224180.0, ans=0.125 2024-08-11 18:13:07,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-11 18:13:12,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1224280.0, ans=0.2 2024-08-11 18:13:26,597 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6500, loss[loss=0.1042, beats_loss=0.01138, ecapa_loss=0.0002498, whisper_loss=0.09031, over 15909.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01112, ecapa_loss=0.0001933, whisper_loss=0.09385, over 3891165.64 frames. ], batch size: 67, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:14:15,159 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 18:14:40,459 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 18:14:55,930 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 18:15:15,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-08-11 18:15:24,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6550, loss[loss=0.1187, beats_loss=0.009757, ecapa_loss=0.0001972, whisper_loss=0.107, over 20480.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01113, ecapa_loss=0.0001945, whisper_loss=0.09366, over 3920310.83 frames. ], batch size: 80, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:15:49,815 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-11 18:15:51,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1224980.0, ans=0.0 2024-08-11 18:16:06,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+01 2.812e+01 3.232e+01 4.010e+01 5.660e+01, threshold=6.463e+01, percent-clipped=0.0 2024-08-11 18:16:09,731 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 18:16:19,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1225080.0, ans=0.0 2024-08-11 18:16:31,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1225180.0, ans=0.125 2024-08-11 18:16:45,546 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 18:16:57,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=10.0 2024-08-11 18:17:05,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6600, loss[loss=0.08532, beats_loss=0.01355, ecapa_loss=0.0001424, whisper_loss=0.07035, over 22286.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01111, ecapa_loss=0.0001948, whisper_loss=0.09431, over 3948602.51 frames. ], batch size: 88, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:17:08,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1225380.0, ans=0.125 2024-08-11 18:17:38,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-11 18:17:43,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1225580.0, ans=0.2 2024-08-11 18:17:53,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1225580.0, ans=0.125 2024-08-11 18:18:03,722 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 18:18:31,509 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 18:18:33,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6650, loss[loss=0.1115, beats_loss=0.01051, ecapa_loss=0.000197, whisper_loss=0.09904, over 15010.00 frames. ], tot_loss[loss=0.1077, beats_loss=0.01106, ecapa_loss=0.000195, whisper_loss=0.09466, over 3969809.82 frames. ], batch size: 58, lr: 7.04e-03, grad_scale: 1.152921504606847e+18 2024-08-11 18:18:37,643 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 18:18:47,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-11 18:18:56,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1225980.0, ans=0.125 2024-08-11 18:19:02,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-11 18:19:02,902 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.730e+01 3.226e+01 3.856e+01 7.096e+01, threshold=6.452e+01, percent-clipped=1.0 2024-08-11 18:19:05,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1225980.0, ans=0.1 2024-08-11 18:19:08,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1226080.0, ans=15.0 2024-08-11 18:19:10,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1226080.0, ans=0.0 2024-08-11 18:19:38,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-08-11 18:19:40,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1226180.0, ans=0.2 2024-08-11 18:19:41,640 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-11 18:20:00,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6700, loss[loss=0.09432, beats_loss=0.01294, ecapa_loss=0.0001699, whisper_loss=0.07969, over 20556.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.01118, ecapa_loss=0.0001957, whisper_loss=0.09406, over 3942522.13 frames. ], batch size: 83, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:20:04,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1226380.0, ans=0.1 2024-08-11 18:20:07,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-08-11 18:20:22,689 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 18:20:36,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1226580.0, ans=0.125 2024-08-11 18:20:50,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=1226680.0, ans=0.2 2024-08-11 18:20:56,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1226680.0, ans=0.2 2024-08-11 18:20:57,764 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-11 18:21:05,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1226780.0, ans=0.04949747468305833 2024-08-11 18:21:11,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1226780.0, ans=15.0 2024-08-11 18:21:13,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2024-08-11 18:21:16,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2024-08-11 18:21:20,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-08-11 18:21:25,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6750, loss[loss=0.09841, beats_loss=0.01261, ecapa_loss=0.0001842, whisper_loss=0.08396, over 21761.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01118, ecapa_loss=0.0001949, whisper_loss=0.09419, over 3910091.67 frames. ], batch size: 88, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:21:28,806 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 18:21:49,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1226980.0, ans=0.1 2024-08-11 18:21:51,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1226980.0, ans=0.95 2024-08-11 18:21:57,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.725e+01 3.041e+01 3.593e+01 5.305e+01, threshold=6.083e+01, percent-clipped=0.0 2024-08-11 18:22:04,407 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 18:22:23,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1227180.0, ans=0.125 2024-08-11 18:22:27,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1227180.0, ans=0.125 2024-08-11 18:22:31,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1227180.0, ans=0.125 2024-08-11 18:22:42,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1227280.0, ans=0.125 2024-08-11 18:22:46,338 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 18:22:52,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6800, loss[loss=0.1149, beats_loss=0.01293, ecapa_loss=0.0001587, whisper_loss=0.1003, over 15480.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01114, ecapa_loss=0.0001963, whisper_loss=0.09387, over 3898785.99 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:23:09,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1227480.0, ans=0.05 2024-08-11 18:23:24,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1227480.0, ans=0.125 2024-08-11 18:23:34,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1227580.0, ans=0.0 2024-08-11 18:23:36,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1227580.0, ans=0.0 2024-08-11 18:23:40,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2024-08-11 18:23:44,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1227680.0, ans=0.125 2024-08-11 18:23:48,196 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-11 18:23:51,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1227680.0, ans=0.125 2024-08-11 18:24:00,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1227780.0, ans=0.125 2024-08-11 18:24:19,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6850, loss[loss=0.07723, beats_loss=0.01033, ecapa_loss=0.0001982, whisper_loss=0.06492, over 17971.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01113, ecapa_loss=0.0001942, whisper_loss=0.09431, over 3881351.25 frames. ], batch size: 71, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:24:20,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2024-08-11 18:24:25,784 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 18:24:40,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1227980.0, ans=0.0 2024-08-11 18:24:45,065 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 18:24:52,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.557e+01 2.801e+01 3.138e+01 4.430e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-11 18:24:59,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2024-08-11 18:25:16,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1228180.0, ans=15.0 2024-08-11 18:25:18,233 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 18:25:35,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-11 18:25:38,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1228280.0, ans=0.2 2024-08-11 18:25:49,209 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 18:25:51,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2024-08-11 18:25:52,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6900, loss[loss=0.09563, beats_loss=0.0135, ecapa_loss=0.0002245, whisper_loss=0.07989, over 21254.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01113, ecapa_loss=0.0001933, whisper_loss=0.09458, over 3879395.81 frames. ], batch size: 94, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:26:13,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1228480.0, ans=0.1 2024-08-11 18:26:35,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1228580.0, ans=0.1 2024-08-11 18:26:45,289 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-11 18:27:07,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1228780.0, ans=0.1 2024-08-11 18:27:23,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 6950, loss[loss=0.09075, beats_loss=0.01124, ecapa_loss=0.0001958, whisper_loss=0.07755, over 16530.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01117, ecapa_loss=0.0001916, whisper_loss=0.0936, over 3875967.35 frames. ], batch size: 66, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:27:35,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1228880.0, ans=0.125 2024-08-11 18:27:37,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1228880.0, ans=0.125 2024-08-11 18:27:41,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1228980.0, ans=0.0 2024-08-11 18:27:46,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1228980.0, ans=0.125 2024-08-11 18:27:56,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.618e+01 3.004e+01 3.400e+01 5.942e+01, threshold=6.008e+01, percent-clipped=1.0 2024-08-11 18:28:01,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1229080.0, ans=0.2 2024-08-11 18:28:18,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1229180.0, ans=6.0 2024-08-11 18:28:21,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1229180.0, ans=0.125 2024-08-11 18:28:29,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1229180.0, ans=0.0 2024-08-11 18:28:33,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.26 vs. limit=22.5 2024-08-11 18:28:54,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7000, loss[loss=0.1042, beats_loss=0.01198, ecapa_loss=0.000144, whisper_loss=0.09075, over 22660.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01119, ecapa_loss=0.0001926, whisper_loss=0.09356, over 3906657.29 frames. ], batch size: 87, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:29:05,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1229380.0, ans=0.125 2024-08-11 18:29:13,665 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 18:30:01,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1229680.0, ans=0.125 2024-08-11 18:30:14,869 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-11 18:30:15,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-11 18:30:23,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7050, loss[loss=0.08189, beats_loss=0.01427, ecapa_loss=0.0001806, whisper_loss=0.06582, over 16681.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01121, ecapa_loss=0.0001938, whisper_loss=0.09285, over 3893460.03 frames. ], batch size: 69, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:30:34,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1229880.0, ans=0.125 2024-08-11 18:30:36,339 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 18:30:54,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.713e+01 3.050e+01 3.555e+01 6.661e+01, threshold=6.100e+01, percent-clipped=2.0 2024-08-11 18:30:58,978 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-11 18:31:04,932 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 18:31:32,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1230180.0, ans=0.125 2024-08-11 18:31:35,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1230280.0, ans=0.0 2024-08-11 18:31:44,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1230280.0, ans=0.1 2024-08-11 18:31:52,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7100, loss[loss=0.09862, beats_loss=0.01323, ecapa_loss=0.0001174, whisper_loss=0.08422, over 14779.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01126, ecapa_loss=0.0001932, whisper_loss=0.09209, over 3859881.20 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:31:54,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1230380.0, ans=0.0 2024-08-11 18:32:10,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1230480.0, ans=0.1 2024-08-11 18:32:24,867 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 18:32:25,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1230580.0, ans=0.0 2024-08-11 18:32:33,970 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.297e-01 2024-08-11 18:32:38,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1230580.0, ans=0.0 2024-08-11 18:32:55,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1230680.0, ans=0.0 2024-08-11 18:33:00,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1230680.0, ans=0.125 2024-08-11 18:33:21,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7150, loss[loss=0.1053, beats_loss=0.01039, ecapa_loss=0.000191, whisper_loss=0.09305, over 21771.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01126, ecapa_loss=0.0001931, whisper_loss=0.09224, over 3894620.88 frames. ], batch size: 85, lr: 7.03e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:33:22,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.11 vs. limit=10.0 2024-08-11 18:33:29,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1230880.0, ans=0.125 2024-08-11 18:33:51,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1230980.0, ans=0.07 2024-08-11 18:33:54,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.688e+01 3.029e+01 3.368e+01 5.006e+01, threshold=6.058e+01, percent-clipped=0.0 2024-08-11 18:33:57,114 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 18:34:36,131 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-11 18:34:55,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7200, loss[loss=0.1168, beats_loss=0.009806, ecapa_loss=0.000207, whisper_loss=0.1049, over 22998.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01123, ecapa_loss=0.0001922, whisper_loss=0.09258, over 3913100.25 frames. ], batch size: 93, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:35:08,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.53 vs. limit=10.0 2024-08-11 18:35:12,617 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 18:35:16,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1231480.0, ans=0.07 2024-08-11 18:35:36,665 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-11 18:35:39,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1231580.0, ans=0.0 2024-08-11 18:35:40,438 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 31 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-11 18:35:48,826 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-11 18:35:50,170 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-11 18:35:59,113 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 18:36:02,306 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 18:36:03,585 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 18:36:18,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1231780.0, ans=0.0 2024-08-11 18:36:21,818 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7250, loss[loss=0.1209, beats_loss=0.008522, ecapa_loss=0.0001984, whisper_loss=0.1104, over 19239.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01122, ecapa_loss=0.0001929, whisper_loss=0.09331, over 3930203.14 frames. ], batch size: 74, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:36:29,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1231880.0, ans=0.0 2024-08-11 18:36:43,534 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-11 18:36:43,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1231980.0, ans=0.0 2024-08-11 18:36:45,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1231980.0, ans=0.0 2024-08-11 18:36:51,913 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.618e+01 2.954e+01 3.399e+01 5.489e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 18:37:12,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1232180.0, ans=0.0 2024-08-11 18:37:29,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-11 18:37:39,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1232280.0, ans=0.0 2024-08-11 18:37:45,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7300, loss[loss=0.114, beats_loss=0.01315, ecapa_loss=0.0001784, whisper_loss=0.09911, over 21870.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01124, ecapa_loss=0.0001922, whisper_loss=0.09322, over 3941494.58 frames. ], batch size: 89, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:37:51,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1232380.0, ans=0.125 2024-08-11 18:37:51,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1232380.0, ans=0.0 2024-08-11 18:37:56,015 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 18:37:56,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2024-08-11 18:38:18,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1232580.0, ans=0.125 2024-08-11 18:39:09,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7350, loss[loss=0.1058, beats_loss=0.01308, ecapa_loss=0.0001762, whisper_loss=0.09099, over 22399.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01122, ecapa_loss=0.0001934, whisper_loss=0.0938, over 3918673.87 frames. ], batch size: 93, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:39:14,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1232880.0, ans=0.0 2024-08-11 18:39:19,777 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 18:39:29,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1232980.0, ans=0.025 2024-08-11 18:39:32,474 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 18:39:35,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1232980.0, ans=0.125 2024-08-11 18:39:39,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.156e+01 2.552e+01 3.033e+01 3.374e+01 5.510e+01, threshold=6.067e+01, percent-clipped=0.0 2024-08-11 18:39:55,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1233080.0, ans=0.0 2024-08-11 18:40:13,797 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 18:40:18,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-08-11 18:40:32,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7400, loss[loss=0.0925, beats_loss=0.01091, ecapa_loss=0.0001961, whisper_loss=0.07963, over 19598.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01125, ecapa_loss=0.0001931, whisper_loss=0.09334, over 3930225.71 frames. ], batch size: 79, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:40:45,893 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-11 18:41:08,736 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 18:41:22,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1233680.0, ans=0.125 2024-08-11 18:41:26,644 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 18:41:31,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1233680.0, ans=0.125 2024-08-11 18:41:47,154 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 18:41:52,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1233780.0, ans=0.125 2024-08-11 18:41:52,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.41 vs. limit=10.0 2024-08-11 18:41:55,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7450, loss[loss=0.1103, beats_loss=0.01101, ecapa_loss=0.0001941, whisper_loss=0.09736, over 22747.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01122, ecapa_loss=0.0001938, whisper_loss=0.09307, over 3919527.64 frames. ], batch size: 91, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:42:11,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1233980.0, ans=0.125 2024-08-11 18:42:20,576 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 18:42:28,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+01 2.705e+01 3.012e+01 3.463e+01 6.106e+01, threshold=6.024e+01, percent-clipped=1.0 2024-08-11 18:42:29,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-08-11 18:42:46,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1234080.0, ans=0.125 2024-08-11 18:42:54,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2024-08-11 18:42:57,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1234180.0, ans=0.125 2024-08-11 18:43:05,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1234280.0, ans=0.125 2024-08-11 18:43:06,425 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 18:43:22,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7500, loss[loss=0.1002, beats_loss=0.01087, ecapa_loss=0.000244, whisper_loss=0.08688, over 21292.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01119, ecapa_loss=0.0001959, whisper_loss=0.09303, over 3904992.95 frames. ], batch size: 91, lr: 7.02e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:43:33,145 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 18:43:33,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1234380.0, ans=0.05 2024-08-11 18:43:33,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-08-11 18:43:56,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1234580.0, ans=0.125 2024-08-11 18:43:59,826 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 18:44:13,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1234680.0, ans=0.1 2024-08-11 18:44:25,445 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:44:44,425 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7550, loss[loss=0.08428, beats_loss=0.01288, ecapa_loss=0.0002162, whisper_loss=0.06923, over 20162.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01115, ecapa_loss=0.0001946, whisper_loss=0.09303, over 3879812.63 frames. ], batch size: 88, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:44:44,549 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 18:44:44,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1234880.0, ans=0.09899494936611666 2024-08-11 18:44:52,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1234880.0, ans=0.125 2024-08-11 18:44:57,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=12.0 2024-08-11 18:45:11,311 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-11 18:45:12,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.591e+01 2.941e+01 3.490e+01 1.489e+02, threshold=5.883e+01, percent-clipped=2.0 2024-08-11 18:45:35,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1235180.0, ans=0.125 2024-08-11 18:46:05,607 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 15 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-11 18:46:07,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7600, loss[loss=0.07909, beats_loss=0.01527, ecapa_loss=0.0001443, whisper_loss=0.06237, over 17880.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0112, ecapa_loss=0.0001934, whisper_loss=0.09222, over 3825345.15 frames. ], batch size: 71, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:46:18,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1235380.0, ans=0.125 2024-08-11 18:46:34,273 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 18:46:36,282 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-11 18:46:52,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1235580.0, ans=0.0 2024-08-11 18:46:56,997 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 18:47:02,132 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 18:47:08,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1235680.0, ans=0.035 2024-08-11 18:47:26,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1235780.0, ans=0.2 2024-08-11 18:47:32,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1235880.0, ans=0.2 2024-08-11 18:47:34,262 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7650, loss[loss=0.09975, beats_loss=0.01109, ecapa_loss=0.0001863, whisper_loss=0.08679, over 23853.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01115, ecapa_loss=0.0001932, whisper_loss=0.09293, over 3861222.87 frames. ], batch size: 95, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:47:51,433 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 18:48:04,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.623e+01 3.033e+01 3.717e+01 6.248e+01, threshold=6.065e+01, percent-clipped=1.0 2024-08-11 18:48:04,660 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 18:48:05,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2024-08-11 18:48:25,773 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-11 18:48:29,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1236180.0, ans=0.125 2024-08-11 18:48:52,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1236280.0, ans=0.0 2024-08-11 18:49:00,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7700, loss[loss=0.1032, beats_loss=0.01066, ecapa_loss=0.000204, whisper_loss=0.09052, over 22033.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01122, ecapa_loss=0.0001927, whisper_loss=0.09249, over 3864519.66 frames. ], batch size: 91, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:49:35,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1236580.0, ans=0.0 2024-08-11 18:49:55,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-08-11 18:50:04,606 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 18:50:07,740 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 18:50:22,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7750, loss[loss=0.1314, beats_loss=0.01028, ecapa_loss=0.0001854, whisper_loss=0.1193, over 21158.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01127, ecapa_loss=0.0001914, whisper_loss=0.09231, over 3883825.89 frames. ], batch size: 83, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:50:48,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1236980.0, ans=0.125 2024-08-11 18:50:52,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.693e+01 2.903e+01 3.373e+01 1.168e+02, threshold=5.806e+01, percent-clipped=1.0 2024-08-11 18:50:54,090 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 18:51:00,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1237080.0, ans=0.0 2024-08-11 18:51:03,352 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 20 from LS+wenet, 33 from Vox, 41 fro AS 2024-08-11 18:51:04,682 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-11 18:51:18,313 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 18:51:19,542 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 18:51:36,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1237280.0, ans=0.125 2024-08-11 18:51:40,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1237380.0, ans=0.125 2024-08-11 18:51:41,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=12.0 2024-08-11 18:51:41,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7800, loss[loss=0.089, beats_loss=0.01342, ecapa_loss=0.0001724, whisper_loss=0.07386, over 19101.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01125, ecapa_loss=0.0001913, whisper_loss=0.09269, over 3900712.74 frames. ], batch size: 79, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:51:47,985 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 18:51:48,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1237380.0, ans=0.125 2024-08-11 18:51:49,286 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 18:52:00,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1237480.0, ans=0.125 2024-08-11 18:52:29,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1237680.0, ans=0.0 2024-08-11 18:52:41,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1237780.0, ans=0.0 2024-08-11 18:52:49,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1237780.0, ans=0.2 2024-08-11 18:52:53,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-08-11 18:52:57,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7850, loss[loss=0.1062, beats_loss=0.01136, ecapa_loss=0.0002036, whisper_loss=0.09285, over 18983.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0113, ecapa_loss=0.0001919, whisper_loss=0.09242, over 3887346.14 frames. ], batch size: 77, lr: 7.01e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:52:57,404 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-11 18:53:05,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1237880.0, ans=0.125 2024-08-11 18:53:20,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1237980.0, ans=0.125 2024-08-11 18:53:24,523 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.579e+01 2.865e+01 3.320e+01 8.816e+01, threshold=5.729e+01, percent-clipped=1.0 2024-08-11 18:54:13,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7900, loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001757, whisper_loss=0.09081, over 15538.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01136, ecapa_loss=0.0001907, whisper_loss=0.09265, over 3882827.24 frames. ], batch size: 58, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:54:30,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2024-08-11 18:54:31,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1238480.0, ans=0.0 2024-08-11 18:54:38,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1238480.0, ans=0.125 2024-08-11 18:54:51,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1238580.0, ans=0.1 2024-08-11 18:54:52,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1238580.0, ans=0.125 2024-08-11 18:54:53,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1238580.0, ans=0.2 2024-08-11 18:55:01,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1238680.0, ans=0.125 2024-08-11 18:55:02,592 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 18:55:12,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-08-11 18:55:14,802 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 18:55:27,236 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 7950, loss[loss=0.09984, beats_loss=0.01342, ecapa_loss=0.000136, whisper_loss=0.08506, over 19762.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0114, ecapa_loss=0.0001897, whisper_loss=0.09262, over 3881540.75 frames. ], batch size: 75, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:55:28,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2024-08-11 18:55:38,363 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 13 from Vox, 51 fro AS 2024-08-11 18:55:39,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1238980.0, ans=0.125 2024-08-11 18:55:46,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1238980.0, ans=0.0 2024-08-11 18:55:49,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1238980.0, ans=0.2 2024-08-11 18:55:52,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.748e+01 3.056e+01 3.459e+01 5.765e+01, threshold=6.112e+01, percent-clipped=1.0 2024-08-11 18:55:55,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1239080.0, ans=0.1 2024-08-11 18:55:59,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1239080.0, ans=0.125 2024-08-11 18:56:22,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1239280.0, ans=0.125 2024-08-11 18:56:30,477 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 18:56:30,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1239280.0, ans=0.125 2024-08-11 18:56:37,552 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8000, loss[loss=0.1022, beats_loss=0.01204, ecapa_loss=0.0001848, whisper_loss=0.08836, over 21463.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01151, ecapa_loss=0.000188, whisper_loss=0.09244, over 3896968.35 frames. ], batch size: 86, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:56:57,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.17 vs. limit=15.0 2024-08-11 18:57:07,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1239580.0, ans=0.1 2024-08-11 18:57:11,035 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 18:57:43,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-11 18:57:48,190 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8050, loss[loss=0.1002, beats_loss=0.01288, ecapa_loss=0.0001635, whisper_loss=0.08566, over 22878.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01143, ecapa_loss=0.0001877, whisper_loss=0.09319, over 3918267.67 frames. ], batch size: 93, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:57:50,933 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-11 18:57:54,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1239880.0, ans=0.125 2024-08-11 18:57:56,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2024-08-11 18:57:58,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1239880.0, ans=0.1 2024-08-11 18:58:00,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1239980.0, ans=0.1 2024-08-11 18:58:02,652 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-124000.pt 2024-08-11 18:58:12,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1239980.0, ans=0.0 2024-08-11 18:58:14,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.885e+01 3.265e+01 3.759e+01 1.907e+02, threshold=6.530e+01, percent-clipped=2.0 2024-08-11 18:58:30,234 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 18:58:43,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-08-11 18:58:44,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1240280.0, ans=0.125 2024-08-11 18:58:45,658 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 18:58:56,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8100, loss[loss=0.08722, beats_loss=0.01259, ecapa_loss=0.0001682, whisper_loss=0.07295, over 23336.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.0113, ecapa_loss=0.0001897, whisper_loss=0.09347, over 3927875.94 frames. ], batch size: 95, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 18:59:00,548 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 18:59:12,545 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 18:59:14,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240480.0, ans=0.1 2024-08-11 18:59:26,737 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 18:59:32,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240580.0, ans=0.1 2024-08-11 19:00:01,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1240880.0, ans=0.125 2024-08-11 19:00:02,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8150, loss[loss=0.1225, beats_loss=0.009375, ecapa_loss=0.0001955, whisper_loss=0.1112, over 18342.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01122, ecapa_loss=0.0001893, whisper_loss=0.09365, over 3922104.65 frames. ], batch size: 72, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:00:02,752 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 19:00:07,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1240880.0, ans=0.0 2024-08-11 19:00:10,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1240880.0, ans=0.125 2024-08-11 19:00:19,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1240980.0, ans=0.125 2024-08-11 19:00:19,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1240980.0, ans=0.1 2024-08-11 19:00:23,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-08-11 19:00:24,299 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 19:00:26,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.042e+01 2.542e+01 2.871e+01 3.241e+01 4.432e+01, threshold=5.742e+01, percent-clipped=0.0 2024-08-11 19:00:32,192 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 19:00:36,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1241080.0, ans=0.0 2024-08-11 19:00:37,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1241080.0, ans=0.125 2024-08-11 19:00:54,609 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 19:01:08,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8200, loss[loss=0.09961, beats_loss=0.01195, ecapa_loss=0.0001755, whisper_loss=0.08591, over 20931.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01129, ecapa_loss=0.00019, whisper_loss=0.09284, over 3911982.30 frames. ], batch size: 83, lr: 7.00e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:01:17,794 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 19:01:19,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1241380.0, ans=0.1 2024-08-11 19:01:20,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1241480.0, ans=0.125 2024-08-11 19:01:23,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2024-08-11 19:01:35,785 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 19:01:36,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1241580.0, ans=0.05 2024-08-11 19:01:40,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1241580.0, ans=0.2 2024-08-11 19:01:42,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1241580.0, ans=0.0 2024-08-11 19:01:42,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1241580.0, ans=0.0 2024-08-11 19:01:43,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2024-08-11 19:01:43,870 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-11 19:01:48,201 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:01:54,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1241680.0, ans=0.0 2024-08-11 19:02:01,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1241780.0, ans=0.125 2024-08-11 19:02:09,276 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 19:02:14,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8250, loss[loss=0.09947, beats_loss=0.01523, ecapa_loss=0.0001808, whisper_loss=0.08244, over 18890.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0113, ecapa_loss=0.0001914, whisper_loss=0.09239, over 3901532.70 frames. ], batch size: 78, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:02:19,494 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-11 19:02:37,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.577e+01 2.823e+01 3.231e+01 7.611e+01, threshold=5.645e+01, percent-clipped=2.0 2024-08-11 19:02:42,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=12.0 2024-08-11 19:02:56,487 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 33 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 19:03:02,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2024-08-11 19:03:04,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1242180.0, ans=0.0 2024-08-11 19:03:11,164 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 19:03:13,794 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 19:03:16,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1242280.0, ans=0.1 2024-08-11 19:03:18,868 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-11 19:03:19,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1242380.0, ans=0.1 2024-08-11 19:03:19,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8300, loss[loss=0.1115, beats_loss=0.009038, ecapa_loss=0.0002056, whisper_loss=0.1004, over 13881.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01128, ecapa_loss=0.0001927, whisper_loss=0.09205, over 3891463.63 frames. ], batch size: 54, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:03:20,126 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 19:03:23,989 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:03:47,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1242580.0, ans=0.07 2024-08-11 19:03:50,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1242580.0, ans=10.0 2024-08-11 19:03:51,882 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-11 19:04:04,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1242680.0, ans=0.1 2024-08-11 19:04:10,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1242680.0, ans=0.1 2024-08-11 19:04:17,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1242780.0, ans=0.0 2024-08-11 19:04:25,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8350, loss[loss=0.0919, beats_loss=0.01143, ecapa_loss=0.0001805, whisper_loss=0.07866, over 20267.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01131, ecapa_loss=0.0001932, whisper_loss=0.09179, over 3881359.45 frames. ], batch size: 82, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:04:28,574 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-11 19:04:30,214 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.066e-03 2024-08-11 19:04:37,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1242980.0, ans=0.125 2024-08-11 19:04:49,301 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.806e+01 3.050e+01 3.549e+01 1.399e+02, threshold=6.100e+01, percent-clipped=1.0 2024-08-11 19:04:55,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1243080.0, ans=0.0 2024-08-11 19:05:08,795 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 19:05:09,119 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.384e-01 2024-08-11 19:05:09,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-08-11 19:05:16,568 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 19:05:24,320 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-11 19:05:28,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1243280.0, ans=0.2 2024-08-11 19:05:30,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8400, loss[loss=0.1099, beats_loss=0.01133, ecapa_loss=0.0002162, whisper_loss=0.09643, over 17784.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01126, ecapa_loss=0.0001938, whisper_loss=0.0924, over 3891706.54 frames. ], batch size: 70, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:05:33,544 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 19:05:33,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1243380.0, ans=0.2 2024-08-11 19:05:46,669 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 19:05:59,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-11 19:06:09,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-11 19:06:37,001 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8450, loss[loss=0.08397, beats_loss=0.01358, ecapa_loss=0.0002131, whisper_loss=0.06825, over 14006.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01122, ecapa_loss=0.0001935, whisper_loss=0.09207, over 3864647.31 frames. ], batch size: 60, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:07:00,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.505e+01 2.848e+01 3.231e+01 4.188e+01, threshold=5.696e+01, percent-clipped=0.0 2024-08-11 19:07:22,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1244180.0, ans=0.125 2024-08-11 19:07:32,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1244280.0, ans=0.125 2024-08-11 19:07:41,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1244380.0, ans=0.1 2024-08-11 19:07:42,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8500, loss[loss=0.127, beats_loss=0.009397, ecapa_loss=0.0001897, whisper_loss=0.1157, over 23804.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01117, ecapa_loss=0.0001949, whisper_loss=0.09268, over 3843537.09 frames. ], batch size: 92, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:08:00,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=15.0 2024-08-11 19:08:04,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-08-11 19:08:06,520 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 28 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-11 19:08:22,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1244680.0, ans=0.125 2024-08-11 19:08:29,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-08-11 19:08:34,450 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 19:08:39,707 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-11 19:08:41,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1244780.0, ans=0.2 2024-08-11 19:08:49,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8550, loss[loss=0.0583, beats_loss=0.01303, ecapa_loss=0.0001489, whisper_loss=0.04379, over 14645.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001938, whisper_loss=0.09285, over 3816373.01 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:08:56,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1244880.0, ans=0.015 2024-08-11 19:09:13,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.056e+01 2.649e+01 3.008e+01 3.594e+01 2.630e+02, threshold=6.016e+01, percent-clipped=2.0 2024-08-11 19:09:20,838 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 19:09:31,781 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-11 19:09:42,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1245280.0, ans=0.125 2024-08-11 19:09:43,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1245280.0, ans=22.5 2024-08-11 19:09:54,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8600, loss[loss=0.09348, beats_loss=0.01331, ecapa_loss=0.0001782, whisper_loss=0.07839, over 19688.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01107, ecapa_loss=0.0001939, whisper_loss=0.09311, over 3821189.46 frames. ], batch size: 80, lr: 6.99e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:09:56,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1245380.0, ans=0.0 2024-08-11 19:10:05,300 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 19:10:08,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1245480.0, ans=0.1 2024-08-11 19:10:46,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1245780.0, ans=0.125 2024-08-11 19:10:49,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1245780.0, ans=0.07 2024-08-11 19:11:00,845 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-11 19:11:01,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8650, loss[loss=0.1117, beats_loss=0.01015, ecapa_loss=0.0002176, whisper_loss=0.09938, over 21214.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01112, ecapa_loss=0.0001942, whisper_loss=0.09329, over 3826117.40 frames. ], batch size: 89, lr: 6.98e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:11:03,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1245880.0, ans=0.1 2024-08-11 19:11:03,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1245880.0, ans=0.125 2024-08-11 19:11:07,640 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-11 19:11:18,676 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-11 19:11:24,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1245980.0, ans=0.125 2024-08-11 19:11:26,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+01 2.702e+01 2.920e+01 3.348e+01 5.833e+01, threshold=5.840e+01, percent-clipped=0.0 2024-08-11 19:11:26,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1245980.0, ans=0.125 2024-08-11 19:11:29,207 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-11 19:11:48,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1246180.0, ans=0.125 2024-08-11 19:11:50,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=12.0 2024-08-11 19:11:55,572 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-11 19:11:57,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.75 vs. limit=10.0 2024-08-11 19:11:59,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1246280.0, ans=0.0 2024-08-11 19:12:12,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8700, loss[loss=0.09192, beats_loss=0.01401, ecapa_loss=0.0001711, whisper_loss=0.0762, over 22484.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01116, ecapa_loss=0.0001948, whisper_loss=0.09321, over 3839780.23 frames. ], batch size: 94, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:12:23,823 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-11 19:12:29,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1246480.0, ans=0.125 2024-08-11 19:12:39,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1246480.0, ans=0.1 2024-08-11 19:12:48,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1246580.0, ans=0.1 2024-08-11 19:12:54,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.09 vs. limit=10.0 2024-08-11 19:13:24,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1246780.0, ans=0.0 2024-08-11 19:13:31,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8750, loss[loss=0.1064, beats_loss=0.01038, ecapa_loss=0.0001787, whisper_loss=0.0942, over 20165.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01113, ecapa_loss=0.0001953, whisper_loss=0.09349, over 3820155.83 frames. ], batch size: 81, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:13:47,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-11 19:13:58,583 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-11 19:14:02,181 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.729e+01 3.149e+01 3.725e+01 7.299e+01, threshold=6.297e+01, percent-clipped=2.0 2024-08-11 19:14:02,314 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 19:14:24,453 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-11 19:14:27,388 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 19:14:39,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-08-11 19:14:47,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1247280.0, ans=0.2 2024-08-11 19:14:47,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1247280.0, ans=0.1 2024-08-11 19:14:56,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8800, loss[loss=0.1062, beats_loss=0.009872, ecapa_loss=0.0002346, whisper_loss=0.09402, over 22687.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01122, ecapa_loss=0.0001941, whisper_loss=0.0931, over 3874484.88 frames. ], batch size: 93, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:15:03,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1247380.0, ans=0.2 2024-08-11 19:15:04,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1247380.0, ans=0.125 2024-08-11 19:15:11,841 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 19:15:31,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1247580.0, ans=10.0 2024-08-11 19:16:21,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8850, loss[loss=0.09376, beats_loss=0.01188, ecapa_loss=0.0001962, whisper_loss=0.07991, over 17387.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0113, ecapa_loss=0.0001941, whisper_loss=0.09267, over 3886418.56 frames. ], batch size: 70, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:16:21,791 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-11 19:16:29,459 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 19:16:52,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.673e+01 2.972e+01 3.544e+01 5.278e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-11 19:17:07,056 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 19:17:14,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2024-08-11 19:17:22,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1248180.0, ans=0.015 2024-08-11 19:17:47,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8900, loss[loss=0.1077, beats_loss=0.009574, ecapa_loss=0.0001761, whisper_loss=0.09637, over 16166.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01127, ecapa_loss=0.0001928, whisper_loss=0.09279, over 3857700.46 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:17:53,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1248380.0, ans=0.125 2024-08-11 19:18:04,172 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-11 19:18:04,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1248480.0, ans=0.125 2024-08-11 19:19:14,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 8950, loss[loss=0.09894, beats_loss=0.01341, ecapa_loss=0.0002026, whisper_loss=0.08351, over 17617.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0001935, whisper_loss=0.09309, over 3862638.33 frames. ], batch size: 73, lr: 6.98e-03, grad_scale: 1.152921504606847e+18 2024-08-11 19:19:28,285 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 17 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 19:19:38,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1248980.0, ans=0.125 2024-08-11 19:19:44,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+01 2.588e+01 3.053e+01 3.414e+01 5.392e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-11 19:19:50,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.48 vs. limit=15.0 2024-08-11 19:20:11,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-11 19:20:11,887 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 19:20:29,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=1249280.0, ans=0.02 2024-08-11 19:20:34,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-11 19:20:38,769 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9000, loss[loss=0.1098, beats_loss=0.009675, ecapa_loss=0.0001861, whisper_loss=0.09822, over 17530.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01127, ecapa_loss=0.0001926, whisper_loss=0.0927, over 3861246.87 frames. ], batch size: 68, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:20:38,771 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 19:20:55,439 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8849, 2.3542, 3.5081, 3.4713], device='cuda:0') 2024-08-11 19:21:20,499 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on ASR_libri: loss=0.2565, beats_loss=0, ecapa_loss=0.0006239, whisper_loss=0.2503, over 922467.00 frames. 2024-08-11 19:21:39,249 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on SV_voxceleb1: loss=0.005312, beats_loss=0, ecapa_loss=0.0005312, whisper_loss=0, over 939242.00 frames. 2024-08-11 19:23:36,249 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on AT_audioset: loss=0.02491, beats_loss=0.02491, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 19:23:36,254 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 19:24:30,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1249680.0, ans=0.125 2024-08-11 19:24:31,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1249680.0, ans=0.125 2024-08-11 19:25:00,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9050, loss[loss=0.1003, beats_loss=0.0139, ecapa_loss=0.0001799, whisper_loss=0.08461, over 13805.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01124, ecapa_loss=0.0001917, whisper_loss=0.09287, over 3829183.00 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:25:03,183 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 19:25:12,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-11 19:25:32,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.548e+01 2.793e+01 3.280e+01 4.630e+01, threshold=5.586e+01, percent-clipped=0.0 2024-08-11 19:25:38,391 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 19:25:45,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1250080.0, ans=0.125 2024-08-11 19:25:52,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-08-11 19:26:04,854 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-11 19:26:26,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9100, loss[loss=0.09747, beats_loss=0.01163, ecapa_loss=0.0001845, whisper_loss=0.084, over 21554.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01118, ecapa_loss=0.0001917, whisper_loss=0.09339, over 3880167.16 frames. ], batch size: 89, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:26:29,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.31 vs. limit=10.0 2024-08-11 19:26:33,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-11 19:26:44,367 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 19:26:44,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1250480.0, ans=0.2 2024-08-11 19:26:56,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1250480.0, ans=0.125 2024-08-11 19:27:12,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1250580.0, ans=0.0 2024-08-11 19:27:18,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1250580.0, ans=0.0 2024-08-11 19:27:37,392 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 19:27:44,922 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 19:27:52,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9150, loss[loss=0.1205, beats_loss=0.00753, ecapa_loss=0.0002242, whisper_loss=0.1107, over 21772.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01118, ecapa_loss=0.000192, whisper_loss=0.09336, over 3887705.45 frames. ], batch size: 88, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:27:55,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2024-08-11 19:28:02,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1250880.0, ans=0.125 2024-08-11 19:28:11,634 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-11 19:28:22,079 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 19:28:23,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.605e+01 2.841e+01 3.221e+01 5.369e+01, threshold=5.683e+01, percent-clipped=0.0 2024-08-11 19:28:37,889 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-11 19:28:42,863 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:29:12,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9200, loss[loss=0.1173, beats_loss=0.009033, ecapa_loss=0.0001725, whisper_loss=0.1066, over 23171.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01119, ecapa_loss=0.0001913, whisper_loss=0.09329, over 3905348.13 frames. ], batch size: 88, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:29:16,012 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-11 19:29:22,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-11 19:29:33,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1251480.0, ans=0.2 2024-08-11 19:29:41,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1251580.0, ans=0.125 2024-08-11 19:29:47,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1251580.0, ans=0.0 2024-08-11 19:29:50,866 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 15 from LS+wenet, 27 from Vox, 49 fro AS 2024-08-11 19:29:55,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1251580.0, ans=10.0 2024-08-11 19:29:55,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=12.0 2024-08-11 19:30:01,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1251680.0, ans=0.125 2024-08-11 19:30:14,873 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 19:30:18,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1251780.0, ans=0.1 2024-08-11 19:30:25,589 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 19:30:28,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9250, loss[loss=0.09617, beats_loss=0.01223, ecapa_loss=0.0001775, whisper_loss=0.08216, over 21174.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01121, ecapa_loss=0.000193, whisper_loss=0.09281, over 3932496.94 frames. ], batch size: 88, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:30:28,732 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 19:30:33,387 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 19:30:34,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-08-11 19:30:57,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.240e+01 2.691e+01 2.985e+01 3.626e+01 6.428e+01, threshold=5.970e+01, percent-clipped=0.0 2024-08-11 19:31:21,595 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 19:31:40,901 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-11 19:31:46,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9300, loss[loss=0.09511, beats_loss=0.01383, ecapa_loss=0.0001728, whisper_loss=0.07955, over 19721.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01115, ecapa_loss=0.0001931, whisper_loss=0.09298, over 3882262.46 frames. ], batch size: 80, lr: 6.97e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:31:53,112 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 35 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-11 19:32:15,684 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-11 19:32:31,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1252680.0, ans=0.125 2024-08-11 19:32:56,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1252780.0, ans=0.125 2024-08-11 19:32:56,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-08-11 19:33:05,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9350, loss[loss=0.1134, beats_loss=0.01231, ecapa_loss=0.0002257, whisper_loss=0.0988, over 21642.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01118, ecapa_loss=0.0001927, whisper_loss=0.09248, over 3873315.22 frames. ], batch size: 92, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:33:06,986 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-11 19:33:11,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1252880.0, ans=0.125 2024-08-11 19:33:21,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1252980.0, ans=0.125 2024-08-11 19:33:30,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1252980.0, ans=0.2 2024-08-11 19:33:35,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.606e+01 3.008e+01 3.444e+01 5.189e+01, threshold=6.015e+01, percent-clipped=1.0 2024-08-11 19:33:41,212 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 19:33:46,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1253080.0, ans=0.0 2024-08-11 19:33:56,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1253180.0, ans=0.0 2024-08-11 19:34:16,253 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 19:34:16,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1253280.0, ans=0.125 2024-08-11 19:34:17,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1253280.0, ans=0.07 2024-08-11 19:34:22,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9400, loss[loss=0.1002, beats_loss=0.00937, ecapa_loss=0.000252, whisper_loss=0.08832, over 20735.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01108, ecapa_loss=0.0001937, whisper_loss=0.09318, over 3904541.35 frames. ], batch size: 89, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:34:28,521 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-11 19:34:37,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1253480.0, ans=0.1 2024-08-11 19:34:44,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1253480.0, ans=0.125 2024-08-11 19:34:54,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1253580.0, ans=0.125 2024-08-11 19:35:08,201 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 19:35:10,949 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 19:35:20,180 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 19:35:20,997 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 19:35:22,964 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-11 19:35:25,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1253780.0, ans=0.125 2024-08-11 19:35:34,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2024-08-11 19:35:37,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9450, loss[loss=0.0873, beats_loss=0.01217, ecapa_loss=0.0001665, whisper_loss=0.07346, over 18662.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01115, ecapa_loss=0.0001936, whisper_loss=0.09305, over 3887513.51 frames. ], batch size: 75, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:35:46,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2024-08-11 19:35:46,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.39 vs. limit=10.0 2024-08-11 19:35:49,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-11 19:35:52,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1253980.0, ans=0.125 2024-08-11 19:36:01,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.693e+01 3.099e+01 3.778e+01 6.565e+01, threshold=6.199e+01, percent-clipped=1.0 2024-08-11 19:36:02,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1254080.0, ans=0.125 2024-08-11 19:36:21,153 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 19:36:40,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=15.0 2024-08-11 19:36:41,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1254280.0, ans=0.07 2024-08-11 19:36:43,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9500, loss[loss=0.1071, beats_loss=0.01206, ecapa_loss=0.0001874, whisper_loss=0.09316, over 21681.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01116, ecapa_loss=0.0001919, whisper_loss=0.09343, over 3903490.77 frames. ], batch size: 89, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:36:51,808 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-11 19:37:18,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1254580.0, ans=0.1 2024-08-11 19:37:24,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1254680.0, ans=0.125 2024-08-11 19:37:34,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1254680.0, ans=0.1 2024-08-11 19:37:46,773 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-11 19:37:49,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9550, loss[loss=0.1144, beats_loss=0.01104, ecapa_loss=0.0002119, whisper_loss=0.1013, over 20015.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01114, ecapa_loss=0.0001925, whisper_loss=0.09268, over 3882128.22 frames. ], batch size: 85, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:37:49,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1254880.0, ans=0.1 2024-08-11 19:37:51,773 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 32 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-11 19:37:55,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1254880.0, ans=0.125 2024-08-11 19:38:08,669 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 19:38:13,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.508e+01 2.726e+01 3.017e+01 8.338e+01, threshold=5.453e+01, percent-clipped=1.0 2024-08-11 19:38:15,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-08-11 19:38:24,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1255080.0, ans=10.0 2024-08-11 19:38:41,957 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 19:38:49,671 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 19:38:54,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9600, loss[loss=0.09803, beats_loss=0.01072, ecapa_loss=0.0001858, whisper_loss=0.08545, over 22061.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01113, ecapa_loss=0.0001919, whisper_loss=0.09346, over 3869527.98 frames. ], batch size: 89, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:39:19,271 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 19:39:30,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1255580.0, ans=0.0 2024-08-11 19:39:33,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1255580.0, ans=0.125 2024-08-11 19:39:39,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1255680.0, ans=0.0 2024-08-11 19:40:02,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9650, loss[loss=0.07585, beats_loss=0.01376, ecapa_loss=0.0001634, whisper_loss=0.06046, over 18278.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01119, ecapa_loss=0.0001924, whisper_loss=0.09208, over 3835146.17 frames. ], batch size: 73, lr: 6.96e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:40:09,110 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 19:40:27,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+01 2.826e+01 3.085e+01 3.592e+01 1.036e+02, threshold=6.169e+01, percent-clipped=1.0 2024-08-11 19:40:28,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1256080.0, ans=0.1 2024-08-11 19:40:37,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1256080.0, ans=0.125 2024-08-11 19:40:44,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1256180.0, ans=0.0 2024-08-11 19:40:46,328 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 19:41:01,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1256280.0, ans=0.0 2024-08-11 19:41:03,801 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 19:41:08,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9700, loss[loss=0.1244, beats_loss=0.01031, ecapa_loss=0.0001781, whisper_loss=0.1123, over 24120.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01116, ecapa_loss=0.0001938, whisper_loss=0.09281, over 3854924.00 frames. ], batch size: 94, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:41:10,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1256380.0, ans=0.125 2024-08-11 19:41:31,426 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 19:41:34,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1256580.0, ans=0.0 2024-08-11 19:41:35,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1256580.0, ans=0.1 2024-08-11 19:41:55,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1256680.0, ans=0.125 2024-08-11 19:41:59,357 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 19:42:13,724 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 19:42:13,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1256880.0, ans=0.125 2024-08-11 19:42:14,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9750, loss[loss=0.1122, beats_loss=0.009663, ecapa_loss=0.0001753, whisper_loss=0.1008, over 16411.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01112, ecapa_loss=0.0001941, whisper_loss=0.09334, over 3863236.71 frames. ], batch size: 63, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:42:27,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1256980.0, ans=0.1 2024-08-11 19:42:34,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1256980.0, ans=0.0 2024-08-11 19:42:40,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.576e+01 2.817e+01 3.279e+01 5.572e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-11 19:42:42,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1257080.0, ans=15.0 2024-08-11 19:42:43,351 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-11 19:43:08,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1257280.0, ans=0.1 2024-08-11 19:43:21,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9800, loss[loss=0.1164, beats_loss=0.009343, ecapa_loss=0.0001821, whisper_loss=0.1052, over 14187.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01109, ecapa_loss=0.000193, whisper_loss=0.09361, over 3857713.78 frames. ], batch size: 55, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:43:31,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-11 19:43:32,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1257380.0, ans=0.2 2024-08-11 19:43:39,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1257480.0, ans=0.125 2024-08-11 19:43:59,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-08-11 19:44:00,209 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 19:44:05,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1257680.0, ans=0.2 2024-08-11 19:44:19,932 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-11 19:44:20,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2024-08-11 19:44:26,440 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9850, loss[loss=0.08233, beats_loss=0.01402, ecapa_loss=0.0001981, whisper_loss=0.06633, over 16405.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01101, ecapa_loss=0.0001918, whisper_loss=0.09436, over 3881120.91 frames. ], batch size: 66, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:44:51,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.685e+01 3.037e+01 3.617e+01 4.839e+01, threshold=6.074e+01, percent-clipped=0.0 2024-08-11 19:44:59,515 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.534e+05 2024-08-11 19:45:03,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1258080.0, ans=0.2 2024-08-11 19:45:06,072 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 19:45:07,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1258180.0, ans=0.125 2024-08-11 19:45:24,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1258280.0, ans=0.125 2024-08-11 19:45:31,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9900, loss[loss=0.1084, beats_loss=0.01099, ecapa_loss=0.0001822, whisper_loss=0.09555, over 18564.00 frames. ], tot_loss[loss=0.1075, beats_loss=0.01101, ecapa_loss=0.0001909, whisper_loss=0.09456, over 3870032.96 frames. ], batch size: 72, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:45:44,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=15.0 2024-08-11 19:45:46,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1258480.0, ans=0.0 2024-08-11 19:46:04,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1258580.0, ans=0.0 2024-08-11 19:46:08,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1258580.0, ans=0.1 2024-08-11 19:46:21,085 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 19:46:25,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1258780.0, ans=0.125 2024-08-11 19:46:36,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 9950, loss[loss=0.1016, beats_loss=0.01213, ecapa_loss=0.0002344, whisper_loss=0.08715, over 17640.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01116, ecapa_loss=0.0001893, whisper_loss=0.09384, over 3862545.11 frames. ], batch size: 74, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:46:39,269 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-11 19:46:39,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-08-11 19:47:00,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1258980.0, ans=0.1 2024-08-11 19:47:01,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.542e+01 2.819e+01 3.280e+01 8.897e+01, threshold=5.637e+01, percent-clipped=1.0 2024-08-11 19:47:08,744 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-11 19:47:11,366 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 12 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 19:47:13,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1259080.0, ans=0.125 2024-08-11 19:47:16,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1259180.0, ans=0.125 2024-08-11 19:47:30,818 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 19:47:39,893 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 19:47:42,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10000, loss[loss=0.1245, beats_loss=0.0104, ecapa_loss=0.000191, whisper_loss=0.1122, over 16040.00 frames. ], tot_loss[loss=0.1071, beats_loss=0.01117, ecapa_loss=0.0001917, whisper_loss=0.09406, over 3853174.32 frames. ], batch size: 62, lr: 6.95e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:47:50,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1259380.0, ans=0.0 2024-08-11 19:47:52,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1259380.0, ans=22.5 2024-08-11 19:47:53,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1259380.0, ans=0.0 2024-08-11 19:47:57,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1259480.0, ans=0.1 2024-08-11 19:48:30,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1259680.0, ans=0.125 2024-08-11 19:48:35,011 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-11 19:48:40,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1259780.0, ans=0.0 2024-08-11 19:48:41,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1259780.0, ans=0.125 2024-08-11 19:48:47,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10050, loss[loss=0.0932, beats_loss=0.0112, ecapa_loss=0.0002184, whisper_loss=0.07981, over 16827.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01118, ecapa_loss=0.0001906, whisper_loss=0.09293, over 3861594.65 frames. ], batch size: 70, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:48:55,509 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 18 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-11 19:49:02,388 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-11 19:49:03,583 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-11 19:49:12,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+01 2.712e+01 3.023e+01 3.510e+01 5.543e+01, threshold=6.045e+01, percent-clipped=0.0 2024-08-11 19:49:15,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1260080.0, ans=0.125 2024-08-11 19:49:16,789 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 35 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 19:49:17,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1260080.0, ans=0.2 2024-08-11 19:49:46,999 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 19:49:52,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10100, loss[loss=0.1032, beats_loss=0.01327, ecapa_loss=0.0001594, whisper_loss=0.08829, over 23445.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01124, ecapa_loss=0.0001888, whisper_loss=0.09303, over 3864490.83 frames. ], batch size: 95, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:50:18,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1260580.0, ans=0.0 2024-08-11 19:50:19,418 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 19:50:19,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1260580.0, ans=0.0 2024-08-11 19:50:55,438 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 19:50:58,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10150, loss[loss=0.08123, beats_loss=0.01088, ecapa_loss=0.0001152, whisper_loss=0.06919, over 16087.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01117, ecapa_loss=0.0001899, whisper_loss=0.09288, over 3882021.25 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:51:14,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1260980.0, ans=0.05 2024-08-11 19:51:23,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.653e+01 2.999e+01 3.558e+01 5.617e+01, threshold=5.997e+01, percent-clipped=0.0 2024-08-11 19:51:25,833 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 19:51:27,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1261080.0, ans=0.125 2024-08-11 19:51:32,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1261080.0, ans=0.125 2024-08-11 19:51:36,503 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.923e+00 2024-08-11 19:51:38,804 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 36 from Vox, 34 fro AS 2024-08-11 19:51:41,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1261180.0, ans=0.125 2024-08-11 19:51:50,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1261280.0, ans=0.2 2024-08-11 19:51:55,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-11 19:52:03,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10200, loss[loss=0.09978, beats_loss=0.01102, ecapa_loss=0.0001494, whisper_loss=0.08726, over 18405.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01109, ecapa_loss=0.0001914, whisper_loss=0.09341, over 3900742.22 frames. ], batch size: 71, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:52:18,310 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-11 19:52:31,547 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 19:52:34,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2024-08-11 19:52:41,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1261580.0, ans=0.2 2024-08-11 19:52:46,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1261680.0, ans=0.0 2024-08-11 19:52:57,742 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 19:53:09,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10250, loss[loss=0.09738, beats_loss=0.01175, ecapa_loss=0.0001715, whisper_loss=0.08392, over 21096.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01103, ecapa_loss=0.0001928, whisper_loss=0.09407, over 3920717.51 frames. ], batch size: 84, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:53:25,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1261980.0, ans=0.125 2024-08-11 19:53:27,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-11 19:53:33,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-08-11 19:53:33,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.624e+01 2.927e+01 3.242e+01 1.065e+02, threshold=5.855e+01, percent-clipped=3.0 2024-08-11 19:53:34,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1262080.0, ans=0.0 2024-08-11 19:53:38,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1262080.0, ans=0.125 2024-08-11 19:53:51,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-08-11 19:54:10,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1262280.0, ans=0.125 2024-08-11 19:54:15,336 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10300, loss[loss=0.1051, beats_loss=0.01131, ecapa_loss=0.0001816, whisper_loss=0.09201, over 15036.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01114, ecapa_loss=0.0001922, whisper_loss=0.0927, over 3902781.47 frames. ], batch size: 57, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:54:26,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1262380.0, ans=0.125 2024-08-11 19:54:33,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1262480.0, ans=0.125 2024-08-11 19:54:56,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1262680.0, ans=0.0 2024-08-11 19:55:01,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1262680.0, ans=0.1 2024-08-11 19:55:10,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1262780.0, ans=0.125 2024-08-11 19:55:13,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1262780.0, ans=0.125 2024-08-11 19:55:17,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-11 19:55:20,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10350, loss[loss=0.1164, beats_loss=0.009501, ecapa_loss=0.0001788, whisper_loss=0.1051, over 17283.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01112, ecapa_loss=0.000193, whisper_loss=0.09305, over 3899732.47 frames. ], batch size: 66, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:55:21,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1262880.0, ans=0.1 2024-08-11 19:55:27,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1262880.0, ans=0.0 2024-08-11 19:55:40,729 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 19:55:45,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.735e+01 3.032e+01 3.459e+01 9.732e+01, threshold=6.064e+01, percent-clipped=1.0 2024-08-11 19:56:07,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-08-11 19:56:14,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1263280.0, ans=0.125 2024-08-11 19:56:16,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1263280.0, ans=0.125 2024-08-11 19:56:26,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10400, loss[loss=0.1063, beats_loss=0.01185, ecapa_loss=0.0001821, whisper_loss=0.09267, over 16930.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01114, ecapa_loss=0.0001934, whisper_loss=0.09277, over 3891277.93 frames. ], batch size: 69, lr: 6.94e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:56:31,772 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 34 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 19:56:36,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1263380.0, ans=0.125 2024-08-11 19:56:46,435 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 19:57:01,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1263580.0, ans=0.2 2024-08-11 19:57:09,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1263680.0, ans=0.0 2024-08-11 19:57:14,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1263680.0, ans=0.2 2024-08-11 19:57:31,797 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10450, loss[loss=0.105, beats_loss=0.01177, ecapa_loss=0.0001929, whisper_loss=0.09126, over 20026.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01124, ecapa_loss=0.0001918, whisper_loss=0.09176, over 3857803.60 frames. ], batch size: 81, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:57:45,415 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 19:57:49,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1263980.0, ans=0.125 2024-08-11 19:57:53,005 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 19:57:56,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.575e+01 2.883e+01 3.290e+01 7.177e+01, threshold=5.767e+01, percent-clipped=1.0 2024-08-11 19:58:07,155 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 19:58:11,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-11 19:58:21,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1264180.0, ans=0.125 2024-08-11 19:58:36,978 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10500, loss[loss=0.08675, beats_loss=0.009289, ecapa_loss=0.0001805, whisper_loss=0.07565, over 14276.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01129, ecapa_loss=0.0001916, whisper_loss=0.09167, over 3863142.98 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:58:46,613 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-11 19:59:38,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1264780.0, ans=0.125 2024-08-11 19:59:43,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10550, loss[loss=0.1049, beats_loss=0.01308, ecapa_loss=0.000198, whisper_loss=0.0898, over 18200.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01131, ecapa_loss=0.0001925, whisper_loss=0.09118, over 3857576.68 frames. ], batch size: 76, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 19:59:52,737 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-11 19:59:53,915 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-11 20:00:08,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.610e+01 2.840e+01 3.443e+01 6.303e+01, threshold=5.679e+01, percent-clipped=1.0 2024-08-11 20:00:12,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1265080.0, ans=0.0 2024-08-11 20:00:17,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1265080.0, ans=0.1 2024-08-11 20:00:27,181 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-11 20:00:27,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1265180.0, ans=0.1 2024-08-11 20:00:40,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 20:00:45,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1265280.0, ans=0.05 2024-08-11 20:00:49,417 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10600, loss[loss=0.1102, beats_loss=0.01003, ecapa_loss=0.000205, whisper_loss=0.09815, over 23522.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01125, ecapa_loss=0.0001926, whisper_loss=0.09215, over 3887095.85 frames. ], batch size: 93, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:00:54,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1265380.0, ans=0.125 2024-08-11 20:01:10,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1265480.0, ans=0.1 2024-08-11 20:01:27,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1265580.0, ans=0.0 2024-08-11 20:01:29,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-11 20:01:40,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1265680.0, ans=0.125 2024-08-11 20:01:43,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1265780.0, ans=0.125 2024-08-11 20:01:45,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1265780.0, ans=0.125 2024-08-11 20:01:55,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10650, loss[loss=0.08799, beats_loss=0.01446, ecapa_loss=0.0001447, whisper_loss=0.07208, over 19257.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0112, ecapa_loss=0.0001915, whisper_loss=0.09311, over 3892557.10 frames. ], batch size: 76, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:01:57,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-08-11 20:02:04,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1265880.0, ans=0.1 2024-08-11 20:02:05,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2024-08-11 20:02:09,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1265980.0, ans=0.125 2024-08-11 20:02:21,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.841e+01 3.157e+01 3.812e+01 6.518e+01, threshold=6.314e+01, percent-clipped=4.0 2024-08-11 20:02:49,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1266280.0, ans=0.0 2024-08-11 20:03:02,758 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10700, loss[loss=0.1085, beats_loss=0.01133, ecapa_loss=0.0001339, whisper_loss=0.09588, over 21369.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01132, ecapa_loss=0.000189, whisper_loss=0.09293, over 3906501.56 frames. ], batch size: 79, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:03:17,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-08-11 20:03:47,683 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 20:03:55,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1266780.0, ans=0.1 2024-08-11 20:03:59,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-11 20:04:09,644 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10750, loss[loss=0.09175, beats_loss=0.0144, ecapa_loss=0.0001541, whisper_loss=0.0758, over 23008.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01138, ecapa_loss=0.0001878, whisper_loss=0.09261, over 3903259.24 frames. ], batch size: 93, lr: 6.93e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:04:36,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.628e+01 2.928e+01 3.321e+01 7.388e+01, threshold=5.856e+01, percent-clipped=1.0 2024-08-11 20:04:40,003 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 20:04:58,559 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 20:05:12,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1267280.0, ans=0.2 2024-08-11 20:05:17,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1267280.0, ans=0.1 2024-08-11 20:05:19,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10800, loss[loss=0.1171, beats_loss=0.01013, ecapa_loss=0.0001726, whisper_loss=0.1052, over 17593.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01135, ecapa_loss=0.0001881, whisper_loss=0.0922, over 3879464.52 frames. ], batch size: 67, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:05:23,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1267380.0, ans=0.2 2024-08-11 20:05:23,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1267380.0, ans=0.1 2024-08-11 20:05:29,100 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-11 20:05:38,237 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 20:05:42,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1267480.0, ans=0.125 2024-08-11 20:05:43,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1267480.0, ans=0.07 2024-08-11 20:05:54,705 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 20:06:01,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1267580.0, ans=0.2 2024-08-11 20:06:20,999 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 20:06:30,571 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 20:06:35,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10850, loss[loss=0.09937, beats_loss=0.01154, ecapa_loss=0.0001954, whisper_loss=0.08588, over 21206.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01133, ecapa_loss=0.0001886, whisper_loss=0.09255, over 3887124.95 frames. ], batch size: 85, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:06:47,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1267880.0, ans=0.0 2024-08-11 20:06:48,156 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 20:07:05,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.647e+01 2.915e+01 3.241e+01 5.191e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-11 20:07:26,481 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-11 20:07:26,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1268180.0, ans=0.0 2024-08-11 20:07:27,806 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 20:07:40,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1268280.0, ans=0.125 2024-08-11 20:07:49,500 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-11 20:07:53,829 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10900, loss[loss=0.1114, beats_loss=0.01194, ecapa_loss=0.0002016, whisper_loss=0.09748, over 22647.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01131, ecapa_loss=0.0001903, whisper_loss=0.09295, over 3920907.84 frames. ], batch size: 93, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:07:59,111 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 20:08:04,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=12.0 2024-08-11 20:08:11,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1268380.0, ans=0.1 2024-08-11 20:08:12,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=12.0 2024-08-11 20:08:20,552 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-11 20:08:21,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=12.0 2024-08-11 20:08:58,502 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-11 20:09:11,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-08-11 20:09:13,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1268780.0, ans=0.1 2024-08-11 20:09:19,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 10950, loss[loss=0.1244, beats_loss=0.01067, ecapa_loss=0.0002347, whisper_loss=0.1113, over 22861.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01122, ecapa_loss=0.0001925, whisper_loss=0.09303, over 3939127.38 frames. ], batch size: 92, lr: 6.92e-03, grad_scale: 5.764607523034235e+17 2024-08-11 20:09:46,548 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-11 20:09:47,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-11 20:10:01,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.627e+01 3.007e+01 3.464e+01 1.236e+02, threshold=6.014e+01, percent-clipped=3.0 2024-08-11 20:10:09,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1269080.0, ans=0.0 2024-08-11 20:10:19,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1269080.0, ans=10.0 2024-08-11 20:10:46,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-11 20:11:04,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1269280.0, ans=0.125 2024-08-11 20:11:08,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11000, loss[loss=0.1098, beats_loss=0.00763, ecapa_loss=0.0001934, whisper_loss=0.1002, over 18921.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01112, ecapa_loss=0.0001934, whisper_loss=0.0937, over 3956263.26 frames. ], batch size: 74, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:11:15,134 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 20:11:21,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-11 20:11:40,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1269480.0, ans=0.1 2024-08-11 20:11:42,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1269480.0, ans=0.2 2024-08-11 20:11:55,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1269580.0, ans=0.0 2024-08-11 20:12:09,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.42 vs. limit=22.5 2024-08-11 20:12:19,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1269680.0, ans=0.125 2024-08-11 20:12:27,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.27 vs. limit=22.5 2024-08-11 20:12:31,787 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-11 20:12:49,492 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11050, loss[loss=0.1174, beats_loss=0.01179, ecapa_loss=0.0001686, whisper_loss=0.1039, over 18151.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01111, ecapa_loss=0.0001927, whisper_loss=0.09291, over 3941824.85 frames. ], batch size: 71, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:12:54,915 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 20:12:58,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1269880.0, ans=0.1 2024-08-11 20:13:10,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1269980.0, ans=0.5 2024-08-11 20:13:23,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1269980.0, ans=0.09899494936611666 2024-08-11 20:13:33,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.512e+01 2.845e+01 3.437e+01 6.269e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-11 20:13:41,514 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-11 20:13:46,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1270080.0, ans=0.1 2024-08-11 20:14:02,214 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 20:14:05,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1270180.0, ans=0.2 2024-08-11 20:14:31,883 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 20:14:42,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-08-11 20:14:46,131 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11100, loss[loss=0.1161, beats_loss=0.01071, ecapa_loss=0.0001588, whisper_loss=0.1038, over 14769.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01114, ecapa_loss=0.0001916, whisper_loss=0.09329, over 3923389.65 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:14:51,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1270380.0, ans=15.0 2024-08-11 20:15:09,043 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-11 20:15:09,322 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.167e-02 2024-08-11 20:15:09,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1270480.0, ans=0.125 2024-08-11 20:15:48,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1270580.0, ans=0.125 2024-08-11 20:15:50,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1270580.0, ans=0.0 2024-08-11 20:16:07,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1270680.0, ans=0.125 2024-08-11 20:16:47,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11150, loss[loss=0.1016, beats_loss=0.01214, ecapa_loss=0.0001859, whisper_loss=0.08758, over 19370.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01114, ecapa_loss=0.000191, whisper_loss=0.0924, over 3890528.87 frames. ], batch size: 80, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:17:19,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1270980.0, ans=0.125 2024-08-11 20:17:30,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1270980.0, ans=0.125 2024-08-11 20:17:39,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.507e+01 2.811e+01 3.221e+01 4.609e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-11 20:17:39,549 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 20:17:42,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1271080.0, ans=0.0 2024-08-11 20:17:48,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1271080.0, ans=0.125 2024-08-11 20:18:03,309 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 20:18:20,181 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 20:18:20,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-11 20:18:23,633 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-11 20:18:30,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-08-11 20:18:33,698 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 20:18:34,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-11 20:18:36,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11200, loss[loss=0.08332, beats_loss=0.01148, ecapa_loss=0.0002004, whisper_loss=0.06984, over 13983.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01112, ecapa_loss=0.0001912, whisper_loss=0.09222, over 3889071.81 frames. ], batch size: 57, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:18:37,062 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 20:18:48,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1271380.0, ans=0.1 2024-08-11 20:18:55,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1271480.0, ans=0.1 2024-08-11 20:19:16,720 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-11 20:19:52,511 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-11 20:20:05,103 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11250, loss[loss=0.1185, beats_loss=0.00917, ecapa_loss=0.0002305, whisper_loss=0.107, over 22070.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.011, ecapa_loss=0.0001925, whisper_loss=0.09265, over 3900977.96 frames. ], batch size: 91, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:20:17,416 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 20:20:33,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1271980.0, ans=0.125 2024-08-11 20:20:38,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.603e+01 2.926e+01 3.414e+01 6.111e+01, threshold=5.851e+01, percent-clipped=1.0 2024-08-11 20:20:41,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=12.0 2024-08-11 20:20:47,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1272080.0, ans=0.125 2024-08-11 20:21:20,368 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 20:21:25,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-11 20:21:34,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11300, loss[loss=0.09454, beats_loss=0.01329, ecapa_loss=0.0001414, whisper_loss=0.07983, over 17160.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01102, ecapa_loss=0.0001923, whisper_loss=0.09292, over 3894816.06 frames. ], batch size: 67, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:21:45,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1272380.0, ans=0.0 2024-08-11 20:21:57,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2024-08-11 20:22:12,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1272580.0, ans=0.1 2024-08-11 20:22:30,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1272680.0, ans=0.1 2024-08-11 20:22:33,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2024-08-11 20:22:44,362 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 20:22:49,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1272780.0, ans=0.0 2024-08-11 20:22:50,985 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-11 20:23:03,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.33 vs. limit=22.5 2024-08-11 20:23:04,937 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11350, loss[loss=0.1071, beats_loss=0.01089, ecapa_loss=0.0001723, whisper_loss=0.09449, over 18554.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01095, ecapa_loss=0.0001914, whisper_loss=0.09373, over 3940704.95 frames. ], batch size: 73, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:23:05,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=12.0 2024-08-11 20:23:10,157 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 20:23:39,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.545e+01 2.892e+01 3.550e+01 1.179e+02, threshold=5.785e+01, percent-clipped=1.0 2024-08-11 20:23:47,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1273080.0, ans=0.125 2024-08-11 20:23:47,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1273080.0, ans=0.125 2024-08-11 20:24:06,037 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 20:24:17,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.26 vs. limit=22.5 2024-08-11 20:24:35,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11400, loss[loss=0.08995, beats_loss=0.01131, ecapa_loss=0.000227, whisper_loss=0.07636, over 14917.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01091, ecapa_loss=0.0001918, whisper_loss=0.09396, over 3904987.77 frames. ], batch size: 62, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:24:41,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1273380.0, ans=0.125 2024-08-11 20:24:45,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1273380.0, ans=0.0 2024-08-11 20:24:50,214 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 25 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-11 20:24:54,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1273480.0, ans=0.125 2024-08-11 20:25:02,138 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-11 20:25:12,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-08-11 20:25:35,293 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-11 20:25:39,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1273680.0, ans=0.0 2024-08-11 20:25:49,909 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 20:26:03,870 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11450, loss[loss=0.1024, beats_loss=0.0134, ecapa_loss=0.0001711, whisper_loss=0.08734, over 22288.00 frames. ], tot_loss[loss=0.107, beats_loss=0.011, ecapa_loss=0.0001911, whisper_loss=0.09408, over 3918591.78 frames. ], batch size: 92, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:26:07,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1273880.0, ans=0.125 2024-08-11 20:26:29,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1273980.0, ans=0.0 2024-08-11 20:26:38,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.725e+01 3.153e+01 3.598e+01 9.857e+01, threshold=6.305e+01, percent-clipped=2.0 2024-08-11 20:27:20,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1274280.0, ans=0.125 2024-08-11 20:27:23,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1274280.0, ans=0.125 2024-08-11 20:27:33,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11500, loss[loss=0.1003, beats_loss=0.01016, ecapa_loss=0.0002242, whisper_loss=0.08789, over 21602.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01105, ecapa_loss=0.000192, whisper_loss=0.09405, over 3936436.40 frames. ], batch size: 89, lr: 6.91e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:27:41,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1274380.0, ans=0.125 2024-08-11 20:28:00,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1274480.0, ans=0.1 2024-08-11 20:28:00,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-11 20:28:38,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1274680.0, ans=0.125 2024-08-11 20:28:41,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1274680.0, ans=0.2 2024-08-11 20:28:47,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.66 vs. limit=22.5 2024-08-11 20:28:57,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1274780.0, ans=0.125 2024-08-11 20:29:04,069 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-11 20:29:07,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11550, loss[loss=0.1061, beats_loss=0.0127, ecapa_loss=0.0001677, whisper_loss=0.09172, over 22717.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01112, ecapa_loss=0.0001916, whisper_loss=0.09383, over 3922514.56 frames. ], batch size: 91, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:29:07,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1274880.0, ans=0.1 2024-08-11 20:29:43,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.690e+01 2.944e+01 3.463e+01 4.757e+01, threshold=5.888e+01, percent-clipped=0.0 2024-08-11 20:30:00,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1275180.0, ans=0.0 2024-08-11 20:30:17,872 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-11 20:30:19,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1275280.0, ans=0.0 2024-08-11 20:30:34,361 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-11 20:30:37,990 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11600, loss[loss=0.09675, beats_loss=0.01369, ecapa_loss=0.0001819, whisper_loss=0.08124, over 21028.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01115, ecapa_loss=0.0001913, whisper_loss=0.09315, over 3929797.39 frames. ], batch size: 88, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:30:47,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1275380.0, ans=0.1 2024-08-11 20:31:23,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1275580.0, ans=0.0 2024-08-11 20:31:42,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1275680.0, ans=0.125 2024-08-11 20:31:49,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1275780.0, ans=0.125 2024-08-11 20:31:52,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.62 vs. limit=22.5 2024-08-11 20:32:00,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-08-11 20:32:03,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1275780.0, ans=0.0 2024-08-11 20:32:06,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11650, loss[loss=0.1119, beats_loss=0.01126, ecapa_loss=0.0001334, whisper_loss=0.09929, over 18327.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01114, ecapa_loss=0.0001918, whisper_loss=0.09303, over 3895357.63 frames. ], batch size: 64, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:32:11,388 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 20:32:12,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2024-08-11 20:32:25,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1275980.0, ans=0.95 2024-08-11 20:32:32,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1275980.0, ans=0.0 2024-08-11 20:32:44,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.149e+01 2.564e+01 2.809e+01 3.170e+01 4.570e+01, threshold=5.617e+01, percent-clipped=0.0 2024-08-11 20:32:44,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1276080.0, ans=0.2 2024-08-11 20:32:45,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-11 20:32:55,431 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-11 20:33:07,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1276180.0, ans=0.125 2024-08-11 20:33:15,573 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 20:33:16,950 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 20:33:17,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1276180.0, ans=0.2 2024-08-11 20:33:33,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-08-11 20:33:38,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1276280.0, ans=0.1 2024-08-11 20:33:43,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11700, loss[loss=0.0766, beats_loss=0.01212, ecapa_loss=0.0001639, whisper_loss=0.06285, over 18633.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01119, ecapa_loss=0.0001901, whisper_loss=0.09345, over 3921243.00 frames. ], batch size: 78, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:34:06,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1276480.0, ans=0.2 2024-08-11 20:34:18,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1276580.0, ans=0.125 2024-08-11 20:34:27,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1276580.0, ans=0.0 2024-08-11 20:34:30,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1276580.0, ans=0.1 2024-08-11 20:34:32,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=15.0 2024-08-11 20:35:00,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1276780.0, ans=0.0 2024-08-11 20:35:03,508 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-11 20:35:09,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1276780.0, ans=0.125 2024-08-11 20:35:14,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11750, loss[loss=0.08738, beats_loss=0.0138, ecapa_loss=0.0001731, whisper_loss=0.07185, over 16720.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01122, ecapa_loss=0.0001893, whisper_loss=0.09329, over 3916827.39 frames. ], batch size: 69, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:35:22,650 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-11 20:35:42,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1276980.0, ans=0.1 2024-08-11 20:35:49,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.651e+01 2.904e+01 3.391e+01 1.042e+02, threshold=5.808e+01, percent-clipped=2.0 2024-08-11 20:35:59,734 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-11 20:36:07,097 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 38 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 20:36:16,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1277180.0, ans=0.0 2024-08-11 20:36:17,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1277180.0, ans=0.125 2024-08-11 20:36:22,984 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-11 20:36:27,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1277280.0, ans=0.125 2024-08-11 20:36:31,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1277280.0, ans=0.125 2024-08-11 20:36:37,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2024-08-11 20:36:43,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11800, loss[loss=0.07945, beats_loss=0.0124, ecapa_loss=0.0001962, whisper_loss=0.06509, over 15203.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01122, ecapa_loss=0.0001885, whisper_loss=0.09373, over 3905900.43 frames. ], batch size: 64, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:36:54,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=22.5 2024-08-11 20:37:02,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1277480.0, ans=0.1 2024-08-11 20:37:10,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1277480.0, ans=0.0 2024-08-11 20:37:35,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1277680.0, ans=0.125 2024-08-11 20:38:00,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1277780.0, ans=0.1 2024-08-11 20:38:12,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11850, loss[loss=0.1076, beats_loss=0.01138, ecapa_loss=0.0001416, whisper_loss=0.09483, over 23532.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01132, ecapa_loss=0.0001895, whisper_loss=0.09306, over 3937122.19 frames. ], batch size: 91, lr: 6.90e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:38:13,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1277880.0, ans=0.0 2024-08-11 20:38:18,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1277880.0, ans=0.0 2024-08-11 20:38:34,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1277980.0, ans=0.125 2024-08-11 20:38:43,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.029e+01 2.625e+01 2.967e+01 3.340e+01 5.309e+01, threshold=5.933e+01, percent-clipped=0.0 2024-08-11 20:38:56,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1278080.0, ans=0.125 2024-08-11 20:39:34,070 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-11 20:39:38,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11900, loss[loss=0.0832, beats_loss=0.0136, ecapa_loss=0.0001359, whisper_loss=0.06824, over 19236.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0114, ecapa_loss=0.0001881, whisper_loss=0.09288, over 3983592.00 frames. ], batch size: 79, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:39:40,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1278380.0, ans=0.125 2024-08-11 20:40:05,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1278480.0, ans=0.125 2024-08-11 20:40:08,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-11 20:40:20,261 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 20:40:27,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1278580.0, ans=0.125 2024-08-11 20:40:57,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1278780.0, ans=10.0 2024-08-11 20:41:03,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 11950, loss[loss=0.1112, beats_loss=0.01066, ecapa_loss=0.0002038, whisper_loss=0.09846, over 15826.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01133, ecapa_loss=0.0001876, whisper_loss=0.09337, over 3952743.97 frames. ], batch size: 64, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:41:03,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1278880.0, ans=0.125 2024-08-11 20:41:07,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.09 vs. limit=22.5 2024-08-11 20:41:37,067 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.570e+01 2.836e+01 3.237e+01 6.228e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-11 20:41:39,464 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.236e+00 2024-08-11 20:42:01,082 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 20:42:20,905 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 20:42:33,200 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12000, loss[loss=0.1067, beats_loss=0.01046, ecapa_loss=0.0002099, whisper_loss=0.09412, over 21905.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01121, ecapa_loss=0.0001875, whisper_loss=0.09354, over 3927137.75 frames. ], batch size: 86, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:42:33,202 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 20:43:16,147 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0006123, whisper_loss=0.25, over 922467.00 frames. 2024-08-11 20:43:35,251 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on SV_voxceleb1: loss=0.005094, beats_loss=0, ecapa_loss=0.0005094, whisper_loss=0, over 939242.00 frames. 2024-08-11 20:45:30,563 INFO [train_multi_KD3.py:1149] (0/4) Epoch 9, validation on AT_audioset: loss=0.02487, beats_loss=0.02487, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 20:45:30,568 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 20:45:39,969 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-11 20:45:40,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1279380.0, ans=0.125 2024-08-11 20:45:55,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2024-08-11 20:46:05,057 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 20:46:14,747 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-11 20:46:30,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1279680.0, ans=0.125 2024-08-11 20:46:38,781 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 20:46:58,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1279880.0, ans=0.125 2024-08-11 20:46:59,083 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12050, loss[loss=0.1221, beats_loss=0.01052, ecapa_loss=0.0001656, whisper_loss=0.11, over 20319.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.0112, ecapa_loss=0.0001877, whisper_loss=0.09337, over 3883551.98 frames. ], batch size: 78, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:47:17,251 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-128000.pt 2024-08-11 20:47:27,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1279980.0, ans=0.125 2024-08-11 20:47:32,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.178e+01 2.712e+01 3.113e+01 3.609e+01 6.588e+01, threshold=6.227e+01, percent-clipped=3.0 2024-08-11 20:47:43,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1280080.0, ans=0.125 2024-08-11 20:47:44,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2024-08-11 20:47:51,773 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 20:48:27,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12100, loss[loss=0.07356, beats_loss=0.01512, ecapa_loss=0.0001681, whisper_loss=0.05675, over 16041.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01116, ecapa_loss=0.0001886, whisper_loss=0.09341, over 3879699.76 frames. ], batch size: 65, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:48:31,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1280380.0, ans=0.125 2024-08-11 20:48:34,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1280380.0, ans=0.0 2024-08-11 20:48:43,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1280480.0, ans=0.0 2024-08-11 20:48:58,202 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 20:49:05,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1280580.0, ans=0.2 2024-08-11 20:49:09,896 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.362e-01 2024-08-11 20:49:15,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1280580.0, ans=0.0 2024-08-11 20:49:28,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1280680.0, ans=0.125 2024-08-11 20:49:43,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1280780.0, ans=0.0 2024-08-11 20:49:55,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12150, loss[loss=0.1192, beats_loss=0.01188, ecapa_loss=0.0001971, whisper_loss=0.1053, over 22128.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.00019, whisper_loss=0.09322, over 3853361.36 frames. ], batch size: 87, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:49:58,863 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 20:50:25,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2024-08-11 20:50:26,560 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.646e+01 3.020e+01 3.424e+01 5.278e+01, threshold=6.041e+01, percent-clipped=0.0 2024-08-11 20:50:35,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1281080.0, ans=15.0 2024-08-11 20:50:41,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1281080.0, ans=0.0 2024-08-11 20:50:44,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1281180.0, ans=0.0 2024-08-11 20:51:11,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-11 20:51:12,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1281280.0, ans=0.035 2024-08-11 20:51:16,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1281280.0, ans=0.1 2024-08-11 20:51:18,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1281380.0, ans=0.125 2024-08-11 20:51:18,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1281380.0, ans=0.125 2024-08-11 20:51:19,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12200, loss[loss=0.08619, beats_loss=0.01597, ecapa_loss=0.000129, whisper_loss=0.06893, over 17150.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001889, whisper_loss=0.0926, over 3845141.80 frames. ], batch size: 67, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:51:26,374 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-11 20:51:26,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1281380.0, ans=0.125 2024-08-11 20:51:58,349 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-11 20:52:06,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1281580.0, ans=0.125 2024-08-11 20:52:40,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1281780.0, ans=0.0 2024-08-11 20:52:40,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2024-08-11 20:52:43,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12250, loss[loss=0.1279, beats_loss=0.01083, ecapa_loss=0.0001997, whisper_loss=0.1151, over 22279.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01116, ecapa_loss=0.0001876, whisper_loss=0.09286, over 3878227.37 frames. ], batch size: 88, lr: 6.89e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:52:43,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1281880.0, ans=0.125 2024-08-11 20:53:06,097 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-11 20:53:09,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1281980.0, ans=0.0 2024-08-11 20:53:16,093 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+01 2.578e+01 2.932e+01 3.420e+01 1.649e+02, threshold=5.864e+01, percent-clipped=1.0 2024-08-11 20:53:16,339 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 20:53:53,199 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-11 20:53:53,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1282280.0, ans=0.5 2024-08-11 20:53:53,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1282280.0, ans=0.0 2024-08-11 20:54:03,957 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 20:54:04,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-11 20:54:06,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1282280.0, ans=0.1 2024-08-11 20:54:08,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12300, loss[loss=0.07434, beats_loss=0.01292, ecapa_loss=0.0002394, whisper_loss=0.05902, over 21459.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01115, ecapa_loss=0.0001895, whisper_loss=0.09263, over 3888152.19 frames. ], batch size: 95, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:54:10,758 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-11 20:54:13,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1282380.0, ans=0.02 2024-08-11 20:54:27,858 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 20:54:41,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1282580.0, ans=0.125 2024-08-11 20:54:51,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1282580.0, ans=0.0 2024-08-11 20:55:15,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1282680.0, ans=0.125 2024-08-11 20:55:17,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2024-08-11 20:55:25,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1282780.0, ans=0.025 2024-08-11 20:55:26,829 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-11 20:55:35,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12350, loss[loss=0.1152, beats_loss=0.008355, ecapa_loss=0.0001949, whisper_loss=0.1049, over 17458.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001908, whisper_loss=0.09307, over 3922945.76 frames. ], batch size: 66, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:55:42,105 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-11 20:55:42,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1282880.0, ans=0.5 2024-08-11 20:55:46,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1282880.0, ans=0.1 2024-08-11 20:56:06,578 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.524e+01 2.954e+01 3.299e+01 5.655e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-11 20:56:07,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1283080.0, ans=0.125 2024-08-11 20:56:25,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-11 20:56:57,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1283280.0, ans=0.125 2024-08-11 20:57:01,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12400, loss[loss=0.0901, beats_loss=0.01272, ecapa_loss=0.0002066, whisper_loss=0.07531, over 17459.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001898, whisper_loss=0.09266, over 3901398.28 frames. ], batch size: 72, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:57:01,394 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-11 20:57:04,224 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 20:57:21,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1283480.0, ans=0.125 2024-08-11 20:57:38,500 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-11 20:57:59,836 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-11 20:58:03,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1283680.0, ans=0.2 2024-08-11 20:58:09,180 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-11 20:58:11,362 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-11 20:58:15,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1283780.0, ans=0.0 2024-08-11 20:58:20,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2024-08-11 20:58:23,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1283780.0, ans=0.2 2024-08-11 20:58:25,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12450, loss[loss=0.1185, beats_loss=0.01031, ecapa_loss=0.0002148, whisper_loss=0.1061, over 21766.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01108, ecapa_loss=0.0001905, whisper_loss=0.09302, over 3885061.64 frames. ], batch size: 86, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:58:27,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1283880.0, ans=0.125 2024-08-11 20:58:48,228 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-11 20:58:49,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1283980.0, ans=0.125 2024-08-11 20:58:56,127 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.629e+01 2.973e+01 3.425e+01 5.618e+01, threshold=5.946e+01, percent-clipped=0.0 2024-08-11 20:59:18,351 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 20:59:35,108 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-11 20:59:35,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1284280.0, ans=0.0 2024-08-11 20:59:36,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1284280.0, ans=15.0 2024-08-11 20:59:48,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12500, loss[loss=0.1065, beats_loss=0.01166, ecapa_loss=0.0001642, whisper_loss=0.09321, over 15687.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.000189, whisper_loss=0.09321, over 3888928.64 frames. ], batch size: 61, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 20:59:50,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1284380.0, ans=0.0 2024-08-11 20:59:52,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-11 20:59:57,446 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-11 21:00:09,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.54 vs. limit=10.0 2024-08-11 21:00:11,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1284480.0, ans=0.125 2024-08-11 21:00:17,329 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 21:00:25,972 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 21:00:27,198 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 19 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-11 21:00:35,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1284580.0, ans=0.2 2024-08-11 21:00:50,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1284680.0, ans=0.125 2024-08-11 21:00:51,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1284680.0, ans=0.125 2024-08-11 21:00:53,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1284680.0, ans=0.05 2024-08-11 21:01:14,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12550, loss[loss=0.1116, beats_loss=0.01117, ecapa_loss=0.0001801, whisper_loss=0.09863, over 23788.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01113, ecapa_loss=0.0001899, whisper_loss=0.0932, over 3903136.07 frames. ], batch size: 92, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:01:28,632 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 21:01:44,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.709e+01 3.086e+01 3.503e+01 6.566e+01, threshold=6.173e+01, percent-clipped=1.0 2024-08-11 21:02:22,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1285280.0, ans=0.025 2024-08-11 21:02:25,304 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 21:02:35,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12600, loss[loss=0.1114, beats_loss=0.008822, ecapa_loss=0.0001736, whisper_loss=0.1009, over 15033.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01116, ecapa_loss=0.0001896, whisper_loss=0.09353, over 3909525.37 frames. ], batch size: 55, lr: 6.88e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:03:24,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1285680.0, ans=0.0 2024-08-11 21:03:26,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1285680.0, ans=0.125 2024-08-11 21:03:27,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1285680.0, ans=0.125 2024-08-11 21:03:57,181 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12650, loss[loss=0.1041, beats_loss=0.01126, ecapa_loss=0.0002198, whisper_loss=0.09067, over 21975.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01116, ecapa_loss=0.0001897, whisper_loss=0.09378, over 3896861.51 frames. ], batch size: 91, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:04:00,299 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 20 from Vox, 52 fro AS 2024-08-11 21:04:02,805 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.090e-01 2024-08-11 21:04:03,888 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-11 21:04:23,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1285980.0, ans=0.95 2024-08-11 21:04:31,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.568e+01 2.843e+01 3.370e+01 6.340e+01, threshold=5.685e+01, percent-clipped=1.0 2024-08-11 21:04:43,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1286080.0, ans=0.125 2024-08-11 21:04:45,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1286080.0, ans=0.0 2024-08-11 21:04:58,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1286180.0, ans=0.2 2024-08-11 21:05:08,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1286280.0, ans=0.125 2024-08-11 21:05:18,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1286280.0, ans=0.125 2024-08-11 21:05:19,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-11 21:05:25,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12700, loss[loss=0.09973, beats_loss=0.01225, ecapa_loss=0.0001492, whisper_loss=0.086, over 17207.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01115, ecapa_loss=0.0001903, whisper_loss=0.09389, over 3898265.06 frames. ], batch size: 67, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:05:35,269 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-11 21:05:43,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1286480.0, ans=0.125 2024-08-11 21:06:37,256 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-11 21:06:38,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1286780.0, ans=0.125 2024-08-11 21:06:42,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2024-08-11 21:06:47,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12750, loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001621, whisper_loss=0.09084, over 15485.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0112, ecapa_loss=0.0001906, whisper_loss=0.09353, over 3876145.45 frames. ], batch size: 58, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:06:59,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1286880.0, ans=0.0 2024-08-11 21:07:08,712 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 21:07:19,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.648e+01 3.001e+01 3.436e+01 1.023e+02, threshold=6.002e+01, percent-clipped=1.0 2024-08-11 21:07:38,204 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 21:07:42,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1287180.0, ans=0.125 2024-08-11 21:07:46,197 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-11 21:08:00,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1287280.0, ans=0.0 2024-08-11 21:08:01,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1287280.0, ans=0.0 2024-08-11 21:08:04,664 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-11 21:08:10,976 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 21:08:15,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12800, loss[loss=0.1143, beats_loss=0.01045, ecapa_loss=0.000244, whisper_loss=0.1014, over 18219.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01131, ecapa_loss=0.0001918, whisper_loss=0.09283, over 3869887.31 frames. ], batch size: 74, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:08:21,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=1287380.0, ans=10.0 2024-08-11 21:08:25,499 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-11 21:08:50,129 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 21:08:52,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1287580.0, ans=0.125 2024-08-11 21:09:18,965 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 21:09:22,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1287780.0, ans=0.1 2024-08-11 21:09:24,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1287780.0, ans=0.05 2024-08-11 21:09:36,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12850, loss[loss=0.112, beats_loss=0.01271, ecapa_loss=0.0002015, whisper_loss=0.09723, over 22256.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01135, ecapa_loss=0.000191, whisper_loss=0.09244, over 3847037.09 frames. ], batch size: 92, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:10:08,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1287980.0, ans=0.125 2024-08-11 21:10:09,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.541e+01 2.885e+01 3.297e+01 4.788e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-11 21:10:10,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2024-08-11 21:10:11,320 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-11 21:10:19,892 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-11 21:10:25,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1288080.0, ans=0.125 2024-08-11 21:10:50,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=1288280.0, ans=0.1 2024-08-11 21:11:00,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12900, loss[loss=0.09272, beats_loss=0.01263, ecapa_loss=0.0001932, whisper_loss=0.07815, over 22363.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0114, ecapa_loss=0.0001898, whisper_loss=0.09152, over 3849348.72 frames. ], batch size: 93, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:11:07,374 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 21:11:16,305 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-11 21:11:22,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1288480.0, ans=0.125 2024-08-11 21:11:44,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1288580.0, ans=0.2 2024-08-11 21:11:59,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1288680.0, ans=0.125 2024-08-11 21:12:02,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1288680.0, ans=0.0 2024-08-11 21:12:04,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1288780.0, ans=0.125 2024-08-11 21:12:16,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1288780.0, ans=0.0 2024-08-11 21:12:21,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 12950, loss[loss=0.1342, beats_loss=0.007773, ecapa_loss=0.0001917, whisper_loss=0.1245, over 24012.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01127, ecapa_loss=0.0001907, whisper_loss=0.09198, over 3856827.31 frames. ], batch size: 91, lr: 6.87e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:12:22,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1288880.0, ans=0.125 2024-08-11 21:12:26,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=12.0 2024-08-11 21:12:36,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.25 vs. limit=22.5 2024-08-11 21:12:50,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1288980.0, ans=0.125 2024-08-11 21:12:54,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.715e+01 3.125e+01 3.606e+01 5.827e+01, threshold=6.249e+01, percent-clipped=1.0 2024-08-11 21:12:57,527 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-11 21:13:00,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-11 21:13:17,334 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-11 21:13:31,561 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-11 21:13:45,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13000, loss[loss=0.09828, beats_loss=0.01048, ecapa_loss=0.0002402, whisper_loss=0.0854, over 21584.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01121, ecapa_loss=0.0001932, whisper_loss=0.09216, over 3866869.06 frames. ], batch size: 91, lr: 6.87e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:13:46,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1289380.0, ans=0.07 2024-08-11 21:13:47,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1289380.0, ans=0.125 2024-08-11 21:14:24,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1289580.0, ans=0.1 2024-08-11 21:14:32,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-11 21:14:52,864 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-11 21:15:07,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-11 21:15:12,087 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13050, loss[loss=0.1038, beats_loss=0.01124, ecapa_loss=0.0002253, whisper_loss=0.09029, over 21752.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01126, ecapa_loss=0.0001928, whisper_loss=0.09137, over 3828421.70 frames. ], batch size: 89, lr: 6.86e-03, grad_scale: 2.305843009213694e+18 2024-08-11 21:15:20,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-11 21:15:34,482 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 21:15:43,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-08-11 21:15:43,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.491e+01 2.763e+01 3.152e+01 5.442e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-11 21:15:57,332 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 21:16:13,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-11 21:16:20,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1290280.0, ans=0.125 2024-08-11 21:16:31,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-11 21:16:34,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13100, loss[loss=0.08599, beats_loss=0.01107, ecapa_loss=0.0002421, whisper_loss=0.0725, over 20503.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01129, ecapa_loss=0.0001915, whisper_loss=0.09167, over 3828894.52 frames. ], batch size: 91, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:16:37,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-11 21:16:38,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1290380.0, ans=0.0 2024-08-11 21:16:40,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1290380.0, ans=0.125 2024-08-11 21:16:47,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1290380.0, ans=0.125 2024-08-11 21:16:54,592 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 21:17:04,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1290480.0, ans=0.0 2024-08-11 21:17:05,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1290480.0, ans=0.125 2024-08-11 21:17:06,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-08-11 21:17:10,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1290580.0, ans=0.125 2024-08-11 21:17:27,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1290680.0, ans=0.0 2024-08-11 21:17:28,301 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-11 21:17:46,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1290780.0, ans=0.5 2024-08-11 21:18:02,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13150, loss[loss=0.1069, beats_loss=0.009564, ecapa_loss=0.0001606, whisper_loss=0.09569, over 16817.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0113, ecapa_loss=0.0001894, whisper_loss=0.09109, over 3838019.33 frames. ], batch size: 62, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:18:11,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2024-08-11 21:18:13,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.31 vs. limit=22.5 2024-08-11 21:18:22,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1290980.0, ans=0.125 2024-08-11 21:18:24,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1290980.0, ans=0.0 2024-08-11 21:18:36,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.521e+01 2.887e+01 3.350e+01 6.017e+01, threshold=5.775e+01, percent-clipped=1.0 2024-08-11 21:18:36,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1291080.0, ans=0.125 2024-08-11 21:18:38,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-11 21:18:43,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1291080.0, ans=0.2 2024-08-11 21:18:51,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2024-08-11 21:18:53,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1291180.0, ans=0.0 2024-08-11 21:18:54,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2024-08-11 21:19:01,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-08-11 21:19:15,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=12.0 2024-08-11 21:19:16,080 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 21:19:24,853 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13200, loss[loss=0.07738, beats_loss=0.01207, ecapa_loss=0.0002461, whisper_loss=0.06285, over 19399.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0113, ecapa_loss=0.0001887, whisper_loss=0.09093, over 3824255.56 frames. ], batch size: 82, lr: 6.86e-03, grad_scale: 1.152921504606847e+18 2024-08-11 21:19:34,036 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 35 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-11 21:19:49,581 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 21:19:50,816 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-11 21:20:05,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1291580.0, ans=0.0 2024-08-11 21:20:18,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1291680.0, ans=0.0 2024-08-11 21:20:42,020 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-11 21:20:43,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-11 21:20:48,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13250, loss[loss=0.1134, beats_loss=0.008793, ecapa_loss=0.0001756, whisper_loss=0.1028, over 24404.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01122, ecapa_loss=0.0001886, whisper_loss=0.09137, over 3811407.67 frames. ], batch size: 91, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:20:54,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1291880.0, ans=0.1 2024-08-11 21:21:01,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2024-08-11 21:21:02,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1291980.0, ans=0.0 2024-08-11 21:21:07,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1291980.0, ans=0.2 2024-08-11 21:21:21,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.554e+01 3.002e+01 3.444e+01 4.623e+01, threshold=6.004e+01, percent-clipped=0.0 2024-08-11 21:21:47,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-11 21:21:56,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1292280.0, ans=0.0 2024-08-11 21:22:03,056 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 26 from Vox, 16 fro AS 2024-08-11 21:22:05,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13300, loss[loss=0.1025, beats_loss=0.009969, ecapa_loss=0.000235, whisper_loss=0.0902, over 19635.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01111, ecapa_loss=0.0001897, whisper_loss=0.09231, over 3827730.78 frames. ], batch size: 81, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:22:06,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1292380.0, ans=0.0 2024-08-11 21:22:11,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-08-11 21:22:13,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1292380.0, ans=0.125 2024-08-11 21:22:20,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1292380.0, ans=0.2 2024-08-11 21:22:27,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1292480.0, ans=0.125 2024-08-11 21:22:46,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1292580.0, ans=0.125 2024-08-11 21:22:49,315 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-11 21:22:59,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2024-08-11 21:23:02,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1292680.0, ans=0.125 2024-08-11 21:23:08,148 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 21:23:12,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1292780.0, ans=0.125 2024-08-11 21:23:17,916 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 21:23:24,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1292880.0, ans=0.0 2024-08-11 21:23:25,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13350, loss[loss=0.0995, beats_loss=0.01193, ecapa_loss=0.0002022, whisper_loss=0.08555, over 21818.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01113, ecapa_loss=0.0001901, whisper_loss=0.09223, over 3840069.07 frames. ], batch size: 92, lr: 6.86e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:23:25,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=1292880.0, ans=0.02 2024-08-11 21:23:25,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1292880.0, ans=0.125 2024-08-11 21:23:36,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-08-11 21:23:51,582 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 21:23:52,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1292980.0, ans=0.125 2024-08-11 21:23:53,202 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 21:23:55,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.575e+01 2.972e+01 3.296e+01 7.873e+01, threshold=5.944e+01, percent-clipped=3.0 2024-08-11 21:24:02,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1293080.0, ans=0.1 2024-08-11 21:24:06,401 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-11 21:24:13,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1293180.0, ans=0.125 2024-08-11 21:24:16,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1293180.0, ans=0.125 2024-08-11 21:24:30,353 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 21:24:37,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13400, loss[loss=0.1083, beats_loss=0.01364, ecapa_loss=0.0001758, whisper_loss=0.09294, over 22106.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01116, ecapa_loss=0.0001903, whisper_loss=0.09246, over 3857342.95 frames. ], batch size: 88, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:24:48,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-08-11 21:24:55,451 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 21:24:59,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1293480.0, ans=0.125 2024-08-11 21:25:03,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1293580.0, ans=0.125 2024-08-11 21:25:17,213 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-11 21:25:19,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-11 21:25:24,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1293680.0, ans=0.125 2024-08-11 21:25:46,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13450, loss[loss=0.1161, beats_loss=0.009413, ecapa_loss=0.0002206, whisper_loss=0.1045, over 18311.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01119, ecapa_loss=0.000191, whisper_loss=0.09163, over 3861137.59 frames. ], batch size: 69, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:25:56,179 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-11 21:26:03,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1293980.0, ans=0.125 2024-08-11 21:26:07,503 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 21:26:15,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.605e+01 2.918e+01 3.272e+01 4.452e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-11 21:26:37,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1294180.0, ans=0.2 2024-08-11 21:26:41,049 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-11 21:26:45,387 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-11 21:26:55,023 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13500, loss[loss=0.116, beats_loss=0.008949, ecapa_loss=0.0002409, whisper_loss=0.1047, over 17573.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01119, ecapa_loss=0.0001903, whisper_loss=0.0921, over 3878861.33 frames. ], batch size: 70, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:27:03,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1294380.0, ans=0.0 2024-08-11 21:27:14,864 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 21:27:25,541 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-11 21:27:26,819 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 21:27:28,396 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-11 21:28:03,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13550, loss[loss=0.105, beats_loss=0.01081, ecapa_loss=0.0002047, whisper_loss=0.09218, over 21975.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01123, ecapa_loss=0.0001902, whisper_loss=0.09165, over 3883491.35 frames. ], batch size: 90, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:28:13,308 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 21:28:32,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.573e+01 2.919e+01 3.325e+01 1.633e+02, threshold=5.839e+01, percent-clipped=1.0 2024-08-11 21:28:32,626 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 21:28:35,087 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-11 21:28:38,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1295080.0, ans=0.04949747468305833 2024-08-11 21:28:42,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1295080.0, ans=0.2 2024-08-11 21:28:51,890 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 21:29:12,054 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13600, loss[loss=0.09849, beats_loss=0.01215, ecapa_loss=0.000158, whisper_loss=0.08476, over 16793.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01124, ecapa_loss=0.000188, whisper_loss=0.09208, over 3902451.04 frames. ], batch size: 62, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:29:26,015 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-11 21:29:37,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.38 vs. limit=10.0 2024-08-11 21:29:41,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=15.0 2024-08-11 21:29:45,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.35 vs. limit=15.0 2024-08-11 21:30:09,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1295780.0, ans=0.05 2024-08-11 21:30:14,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2024-08-11 21:30:16,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1295780.0, ans=0.2 2024-08-11 21:30:20,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13650, loss[loss=0.08556, beats_loss=0.01325, ecapa_loss=0.0001595, whisper_loss=0.07071, over 15798.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01128, ecapa_loss=0.0001895, whisper_loss=0.09247, over 3895629.89 frames. ], batch size: 62, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:30:20,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1295880.0, ans=0.0 2024-08-11 21:30:21,866 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 21:30:40,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-08-11 21:30:42,255 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 21:30:43,592 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-11 21:30:46,299 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 21:30:48,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.503e+01 2.904e+01 3.318e+01 5.006e+01, threshold=5.809e+01, percent-clipped=0.0 2024-08-11 21:31:12,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-11 21:31:19,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1296280.0, ans=0.1 2024-08-11 21:31:19,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1296280.0, ans=0.05 2024-08-11 21:31:28,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13700, loss[loss=0.07475, beats_loss=0.009922, ecapa_loss=0.0002053, whisper_loss=0.06278, over 17581.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01124, ecapa_loss=0.0001899, whisper_loss=0.09324, over 3910087.75 frames. ], batch size: 72, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:31:28,814 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-11 21:31:51,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1296480.0, ans=0.125 2024-08-11 21:31:54,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1296480.0, ans=0.125 2024-08-11 21:32:04,455 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.648e+00 2024-08-11 21:32:13,938 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 21:32:14,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1296680.0, ans=0.2 2024-08-11 21:32:28,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1296780.0, ans=0.2 2024-08-11 21:32:33,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1296780.0, ans=0.2 2024-08-11 21:32:37,272 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 21:32:38,433 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13750, loss[loss=0.09132, beats_loss=0.01127, ecapa_loss=0.0002086, whisper_loss=0.07796, over 21699.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01119, ecapa_loss=0.000191, whisper_loss=0.09265, over 3895109.95 frames. ], batch size: 87, lr: 6.85e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:32:39,002 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.292e+05 2024-08-11 21:32:51,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1296980.0, ans=0.0 2024-08-11 21:32:57,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.55 vs. limit=22.5 2024-08-11 21:32:58,199 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 21:33:06,560 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-11 21:33:07,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.113e+01 2.561e+01 2.855e+01 3.257e+01 5.078e+01, threshold=5.711e+01, percent-clipped=0.0 2024-08-11 21:33:10,074 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.929e-01 2024-08-11 21:33:18,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1297080.0, ans=0.0 2024-08-11 21:33:22,085 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-11 21:33:32,480 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-11 21:33:36,644 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-11 21:33:36,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1297280.0, ans=0.0 2024-08-11 21:33:48,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13800, loss[loss=0.09057, beats_loss=0.0112, ecapa_loss=0.0002258, whisper_loss=0.07711, over 21155.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01116, ecapa_loss=0.0001899, whisper_loss=0.09315, over 3883979.66 frames. ], batch size: 88, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:33:48,803 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-11 21:34:19,657 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 21:34:22,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1297580.0, ans=0.125 2024-08-11 21:34:30,330 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 21:34:36,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1297680.0, ans=15.0 2024-08-11 21:34:40,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1297680.0, ans=0.125 2024-08-11 21:34:43,963 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-11 21:34:44,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-11 21:34:45,338 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 39 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 21:34:57,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13850, loss[loss=0.102, beats_loss=0.0129, ecapa_loss=0.0001685, whisper_loss=0.08739, over 21049.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01116, ecapa_loss=0.0001898, whisper_loss=0.09318, over 3882305.26 frames. ], batch size: 87, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:35:26,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.750e+01 3.088e+01 3.546e+01 6.102e+01, threshold=6.176e+01, percent-clipped=2.0 2024-08-11 21:35:29,014 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 21:35:33,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1298080.0, ans=10.0 2024-08-11 21:35:41,293 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:35:42,346 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-11 21:35:49,414 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 26 from LS+wenet, 7 from Vox, 25 fro AS 2024-08-11 21:35:54,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=8.0 2024-08-11 21:36:05,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13900, loss[loss=0.08923, beats_loss=0.01345, ecapa_loss=0.0001485, whisper_loss=0.07429, over 14015.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01118, ecapa_loss=0.0001889, whisper_loss=0.09351, over 3884321.39 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:36:07,488 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 10 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 21:36:07,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1298380.0, ans=0.0 2024-08-11 21:36:15,696 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-11 21:36:19,888 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 23 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-11 21:36:31,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2024-08-11 21:36:45,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-11 21:36:53,304 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:37:00,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1298780.0, ans=0.125 2024-08-11 21:37:14,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 13950, loss[loss=0.1043, beats_loss=0.01306, ecapa_loss=0.0001668, whisper_loss=0.08956, over 20510.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01121, ecapa_loss=0.0001865, whisper_loss=0.09333, over 3893921.63 frames. ], batch size: 82, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:37:22,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1298880.0, ans=0.1 2024-08-11 21:37:35,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1298980.0, ans=0.125 2024-08-11 21:37:43,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+01 2.710e+01 3.048e+01 3.326e+01 4.854e+01, threshold=6.095e+01, percent-clipped=0.0 2024-08-11 21:38:00,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1299180.0, ans=0.125 2024-08-11 21:38:07,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-08-11 21:38:23,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14000, loss[loss=0.1268, beats_loss=0.009755, ecapa_loss=0.0002571, whisper_loss=0.1145, over 20553.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01121, ecapa_loss=0.0001873, whisper_loss=0.09308, over 3890611.68 frames. ], batch size: 87, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:38:24,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-11 21:39:12,090 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 21:39:17,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1299780.0, ans=0.2 2024-08-11 21:39:19,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1299780.0, ans=0.0 2024-08-11 21:39:23,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1299780.0, ans=0.1 2024-08-11 21:39:32,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14050, loss[loss=0.07998, beats_loss=0.01406, ecapa_loss=0.0002177, whisper_loss=0.06374, over 21786.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0113, ecapa_loss=0.0001874, whisper_loss=0.09313, over 3897041.59 frames. ], batch size: 93, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:39:37,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1299880.0, ans=0.0 2024-08-11 21:39:39,922 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 21:39:57,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-08-11 21:40:00,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.32 vs. limit=10.0 2024-08-11 21:40:01,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+01 2.630e+01 2.929e+01 3.311e+01 9.104e+01, threshold=5.859e+01, percent-clipped=1.0 2024-08-11 21:40:09,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1300080.0, ans=0.1 2024-08-11 21:40:41,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14100, loss[loss=0.1173, beats_loss=0.009616, ecapa_loss=0.0001796, whisper_loss=0.1059, over 17628.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01133, ecapa_loss=0.0001875, whisper_loss=0.0929, over 3910371.24 frames. ], batch size: 66, lr: 6.84e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:40:47,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1300380.0, ans=0.1 2024-08-11 21:40:51,295 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-11 21:41:12,081 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 21:41:16,203 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-11 21:41:19,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1300580.0, ans=0.125 2024-08-11 21:41:35,846 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 21:41:49,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1300880.0, ans=0.1 2024-08-11 21:41:50,722 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14150, loss[loss=0.116, beats_loss=0.009745, ecapa_loss=0.0001719, whisper_loss=0.1046, over 22854.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0114, ecapa_loss=0.0001866, whisper_loss=0.09257, over 3902385.82 frames. ], batch size: 89, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:42:13,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1300980.0, ans=0.125 2024-08-11 21:42:19,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.615e+01 2.850e+01 3.033e+01 5.082e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-11 21:42:24,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=10.0 2024-08-11 21:42:41,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1301180.0, ans=0.2 2024-08-11 21:42:43,201 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-11 21:42:45,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1301280.0, ans=0.0 2024-08-11 21:42:54,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1301280.0, ans=0.1 2024-08-11 21:42:58,188 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-11 21:42:59,583 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14200, loss[loss=0.09342, beats_loss=0.01478, ecapa_loss=0.0001636, whisper_loss=0.077, over 22013.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01145, ecapa_loss=0.0001868, whisper_loss=0.09258, over 3919515.14 frames. ], batch size: 90, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:43:02,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1301380.0, ans=0.125 2024-08-11 21:43:03,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-11 21:43:09,204 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 21:43:09,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1301380.0, ans=0.125 2024-08-11 21:43:11,629 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-11 21:43:22,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1301480.0, ans=0.125 2024-08-11 21:43:22,924 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.853e-03 2024-08-11 21:43:30,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2024-08-11 21:43:38,035 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 21:43:39,582 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 21:43:43,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1301680.0, ans=0.125 2024-08-11 21:43:54,298 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-11 21:44:03,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1301780.0, ans=0.125 2024-08-11 21:44:07,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-11 21:44:08,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14250, loss[loss=0.115, beats_loss=0.009904, ecapa_loss=0.000204, whisper_loss=0.1031, over 23711.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01138, ecapa_loss=0.0001879, whisper_loss=0.09331, over 3922373.01 frames. ], batch size: 95, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:44:11,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1301880.0, ans=0.0 2024-08-11 21:44:26,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1301980.0, ans=0.125 2024-08-11 21:44:29,287 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-11 21:44:29,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1301980.0, ans=0.125 2024-08-11 21:44:36,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1302080.0, ans=0.2 2024-08-11 21:44:37,524 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.747e+01 3.033e+01 3.629e+01 5.919e+01, threshold=6.067e+01, percent-clipped=2.0 2024-08-11 21:44:57,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.72 vs. limit=10.0 2024-08-11 21:44:59,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-11 21:45:02,913 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 14 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-11 21:45:11,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1302280.0, ans=0.125 2024-08-11 21:45:18,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14300, loss[loss=0.1136, beats_loss=0.01264, ecapa_loss=0.0001657, whisper_loss=0.09928, over 17350.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01146, ecapa_loss=0.0001852, whisper_loss=0.09253, over 3934560.24 frames. ], batch size: 67, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:45:31,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1302480.0, ans=0.0 2024-08-11 21:46:04,064 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-11 21:46:07,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1302680.0, ans=0.2 2024-08-11 21:46:20,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1302780.0, ans=0.1 2024-08-11 21:46:24,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1302780.0, ans=0.0 2024-08-11 21:46:27,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14350, loss[loss=0.1079, beats_loss=0.01161, ecapa_loss=0.0001916, whisper_loss=0.09438, over 21907.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01131, ecapa_loss=0.0001869, whisper_loss=0.09308, over 3928912.24 frames. ], batch size: 89, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:46:29,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1302880.0, ans=0.0 2024-08-11 21:46:32,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1302880.0, ans=0.125 2024-08-11 21:46:56,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.719e+01 2.981e+01 3.464e+01 5.321e+01, threshold=5.963e+01, percent-clipped=0.0 2024-08-11 21:47:00,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1303080.0, ans=0.0 2024-08-11 21:47:13,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1303180.0, ans=0.125 2024-08-11 21:47:33,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-11 21:47:34,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1303280.0, ans=0.125 2024-08-11 21:47:37,028 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14400, loss[loss=0.1241, beats_loss=0.009606, ecapa_loss=0.0001645, whisper_loss=0.1128, over 16634.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01131, ecapa_loss=0.0001879, whisper_loss=0.09248, over 3930434.33 frames. ], batch size: 63, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:47:37,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1303380.0, ans=0.125 2024-08-11 21:48:00,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1303480.0, ans=0.125 2024-08-11 21:48:02,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1303480.0, ans=0.125 2024-08-11 21:48:04,010 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-11 21:48:11,971 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 14 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 21:48:28,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1303680.0, ans=0.0 2024-08-11 21:48:31,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1303780.0, ans=0.125 2024-08-11 21:48:34,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-11 21:48:45,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 9, batch 14450, loss[loss=0.09792, beats_loss=0.01344, ecapa_loss=0.0001745, whisper_loss=0.08274, over 22289.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01134, ecapa_loss=0.0001879, whisper_loss=0.09241, over 3939257.78 frames. ], batch size: 92, lr: 6.83e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:48:49,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1303880.0, ans=0.125 2024-08-11 21:49:07,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-11 21:49:11,336 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 21:49:13,452 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.624e+01 2.900e+01 3.333e+01 5.803e+01, threshold=5.799e+01, percent-clipped=0.0 2024-08-11 21:49:21,713 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-11 21:49:23,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1304080.0, ans=0.125 2024-08-11 21:49:32,207 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-11 21:49:41,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1304280.0, ans=0.125 2024-08-11 21:49:45,810 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-9.pt 2024-08-11 21:50:30,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 0, loss[loss=0.09464, beats_loss=0.01137, ecapa_loss=0.0002099, whisper_loss=0.08118, over 23222.00 frames. ], tot_loss[loss=0.09464, beats_loss=0.01137, ecapa_loss=0.0002099, whisper_loss=0.08118, over 23222.00 frames. ], batch size: 93, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:50:30,435 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 21:51:13,111 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on ASR_libri: loss=0.2568, beats_loss=0, ecapa_loss=0.0006206, whisper_loss=0.2506, over 922467.00 frames. 2024-08-11 21:51:29,348 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on SV_voxceleb1: loss=0.005051, beats_loss=0, ecapa_loss=0.0005051, whisper_loss=0, over 939242.00 frames. 2024-08-11 21:53:33,412 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on AT_audioset: loss=0.02495, beats_loss=0.02495, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 21:53:33,421 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 21:53:37,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1304320.0, ans=0.125 2024-08-11 21:54:14,506 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 21:54:31,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1304520.0, ans=0.05 2024-08-11 21:54:50,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-08-11 21:55:14,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1304620.0, ans=0.0 2024-08-11 21:55:27,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1304720.0, ans=0.125 2024-08-11 21:55:43,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 50, loss[loss=0.1271, beats_loss=0.008695, ecapa_loss=0.0001933, whisper_loss=0.1164, over 24198.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01033, ecapa_loss=0.000201, whisper_loss=0.091, over 841624.01 frames. ], batch size: 93, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:55:48,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1304820.0, ans=0.1 2024-08-11 21:56:36,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1305020.0, ans=0.125 2024-08-11 21:56:47,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.905e+01 3.307e+01 3.702e+01 5.786e+01, threshold=6.614e+01, percent-clipped=0.0 2024-08-11 21:56:56,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1305120.0, ans=0.125 2024-08-11 21:57:01,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1305120.0, ans=0.0 2024-08-11 21:57:23,938 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 21:57:33,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-11 21:57:37,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1305220.0, ans=0.2 2024-08-11 21:57:41,629 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 100, loss[loss=0.08837, beats_loss=0.01278, ecapa_loss=0.0001352, whisper_loss=0.07425, over 17253.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01037, ecapa_loss=0.0001941, whisper_loss=0.09368, over 1524570.75 frames. ], batch size: 67, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:57:46,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2024-08-11 21:57:53,245 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-11 21:57:56,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1305320.0, ans=0.5 2024-08-11 21:57:58,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1305320.0, ans=0.125 2024-08-11 21:58:05,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-08-11 21:58:09,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1305420.0, ans=0.125 2024-08-11 21:58:21,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1305420.0, ans=0.125 2024-08-11 21:58:28,004 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-11 21:58:35,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1305520.0, ans=0.2 2024-08-11 21:58:43,133 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 21:59:09,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1305620.0, ans=0.2 2024-08-11 21:59:14,527 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-11 21:59:21,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1305720.0, ans=0.125 2024-08-11 21:59:32,439 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 150, loss[loss=0.09798, beats_loss=0.01207, ecapa_loss=0.000192, whisper_loss=0.08399, over 14375.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01049, ecapa_loss=0.0001952, whisper_loss=0.09194, over 2028703.31 frames. ], batch size: 57, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 21:59:37,059 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-11 22:00:20,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.798e+01 3.187e+01 3.633e+01 2.129e+02, threshold=6.375e+01, percent-clipped=1.0 2024-08-11 22:00:38,722 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 22:00:49,115 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 22:00:58,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 200, loss[loss=0.1439, beats_loss=0.008077, ecapa_loss=0.0001896, whisper_loss=0.134, over 20817.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01053, ecapa_loss=0.0001947, whisper_loss=0.0927, over 2409386.10 frames. ], batch size: 76, lr: 6.49e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:01:05,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1306320.0, ans=0.125 2024-08-11 22:01:05,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1306320.0, ans=0.125 2024-08-11 22:01:17,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1306420.0, ans=0.125 2024-08-11 22:01:19,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1306420.0, ans=0.0 2024-08-11 22:01:23,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:01:24,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1306420.0, ans=0.09899494936611666 2024-08-11 22:01:47,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1306620.0, ans=0.125 2024-08-11 22:01:52,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1306620.0, ans=0.125 2024-08-11 22:01:58,125 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-11 22:02:03,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1306720.0, ans=0.1 2024-08-11 22:02:10,098 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-11 22:02:10,476 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-11 22:02:12,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1306720.0, ans=0.125 2024-08-11 22:02:14,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 250, loss[loss=0.1115, beats_loss=0.008455, ecapa_loss=0.0002019, whisper_loss=0.101, over 14690.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0107, ecapa_loss=0.0001908, whisper_loss=0.09218, over 2708751.58 frames. ], batch size: 57, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:02:18,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-11 22:02:25,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1306820.0, ans=0.0 2024-08-11 22:02:27,005 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-11 22:02:27,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1306820.0, ans=0.1 2024-08-11 22:02:28,791 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-11 22:02:31,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1306920.0, ans=0.125 2024-08-11 22:02:36,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1306920.0, ans=0.125 2024-08-11 22:02:40,589 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-11 22:02:40,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1306920.0, ans=0.5 2024-08-11 22:02:40,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1306920.0, ans=0.125 2024-08-11 22:02:45,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1307020.0, ans=0.0 2024-08-11 22:02:49,488 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-11 22:02:57,176 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.454e+01 2.692e+01 3.153e+01 8.296e+01, threshold=5.384e+01, percent-clipped=2.0 2024-08-11 22:03:31,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 300, loss[loss=0.1009, beats_loss=0.01065, ecapa_loss=0.000204, whisper_loss=0.08823, over 22849.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01074, ecapa_loss=0.0001911, whisper_loss=0.09238, over 2968729.42 frames. ], batch size: 94, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:03:40,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1307320.0, ans=0.125 2024-08-11 22:03:51,612 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 22:04:02,580 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-11 22:04:46,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 350, loss[loss=0.112, beats_loss=0.01006, ecapa_loss=0.000209, whisper_loss=0.09987, over 22001.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01072, ecapa_loss=0.0001893, whisper_loss=0.09244, over 3160448.68 frames. ], batch size: 91, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:04:47,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2024-08-11 22:04:55,390 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 22:05:01,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1307920.0, ans=0.035 2024-08-11 22:05:20,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1308020.0, ans=0.0 2024-08-11 22:05:21,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1308020.0, ans=0.125 2024-08-11 22:05:26,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.513e+01 2.913e+01 3.282e+01 4.748e+01, threshold=5.825e+01, percent-clipped=0.0 2024-08-11 22:05:27,787 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-11 22:06:01,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 400, loss[loss=0.1453, beats_loss=0.0057, ecapa_loss=0.0001958, whisper_loss=0.1377, over 19227.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01081, ecapa_loss=0.0001877, whisper_loss=0.09228, over 3324004.09 frames. ], batch size: 69, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:06:09,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1308320.0, ans=0.125 2024-08-11 22:06:15,531 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-11 22:06:18,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1308420.0, ans=0.125 2024-08-11 22:06:20,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308420.0, ans=0.1 2024-08-11 22:06:27,155 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-11 22:06:51,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308620.0, ans=0.1 2024-08-11 22:07:04,923 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.985e+00 2024-08-11 22:07:07,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1308720.0, ans=0.125 2024-08-11 22:07:11,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1308720.0, ans=0.2 2024-08-11 22:07:12,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1308720.0, ans=0.1 2024-08-11 22:07:17,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 450, loss[loss=0.1038, beats_loss=0.01193, ecapa_loss=0.0001906, whisper_loss=0.08994, over 17547.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01085, ecapa_loss=0.0001865, whisper_loss=0.0928, over 3467100.67 frames. ], batch size: 72, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:07:33,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-08-11 22:07:51,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1309020.0, ans=0.0 2024-08-11 22:07:55,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1309020.0, ans=0.125 2024-08-11 22:07:57,958 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.716e-01 2024-08-11 22:07:58,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.488e+01 3.017e+01 3.515e+01 8.522e+01, threshold=6.035e+01, percent-clipped=1.0 2024-08-11 22:08:16,991 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-11 22:08:24,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1309220.0, ans=0.07 2024-08-11 22:08:26,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1309220.0, ans=0.125 2024-08-11 22:08:33,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 500, loss[loss=0.1048, beats_loss=0.01039, ecapa_loss=0.0001991, whisper_loss=0.09246, over 20176.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01091, ecapa_loss=0.0001848, whisper_loss=0.09208, over 3528067.88 frames. ], batch size: 78, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:08:33,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1309320.0, ans=0.125 2024-08-11 22:08:47,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1309420.0, ans=0.125 2024-08-11 22:08:48,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1309420.0, ans=0.04949747468305833 2024-08-11 22:08:51,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1309420.0, ans=0.0 2024-08-11 22:08:53,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1309420.0, ans=0.125 2024-08-11 22:09:00,048 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-11 22:09:05,747 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-11 22:09:10,321 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 22:09:10,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1309520.0, ans=0.125 2024-08-11 22:09:18,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1309620.0, ans=0.1 2024-08-11 22:09:28,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1309620.0, ans=0.125 2024-08-11 22:09:46,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1309720.0, ans=0.0 2024-08-11 22:09:50,828 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 550, loss[loss=0.1121, beats_loss=0.01081, ecapa_loss=0.0001736, whisper_loss=0.09954, over 22502.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001839, whisper_loss=0.09248, over 3617505.04 frames. ], batch size: 91, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:09:57,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-08-11 22:09:58,665 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 22:10:11,442 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-11 22:10:19,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1309920.0, ans=0.1 2024-08-11 22:10:26,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1310020.0, ans=0.0 2024-08-11 22:10:28,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1310020.0, ans=0.125 2024-08-11 22:10:32,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.567e+01 3.001e+01 3.540e+01 6.068e+01, threshold=6.003e+01, percent-clipped=1.0 2024-08-11 22:10:48,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1310120.0, ans=0.0 2024-08-11 22:10:49,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1310120.0, ans=0.0 2024-08-11 22:11:07,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 600, loss[loss=0.09511, beats_loss=0.01199, ecapa_loss=0.0001979, whisper_loss=0.08115, over 15880.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01088, ecapa_loss=0.000184, whisper_loss=0.09257, over 3704617.81 frames. ], batch size: 65, lr: 6.48e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:11:08,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1310320.0, ans=0.125 2024-08-11 22:11:17,018 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-11 22:11:41,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1310520.0, ans=6.0 2024-08-11 22:11:44,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1310520.0, ans=0.0 2024-08-11 22:12:12,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2024-08-11 22:12:19,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1310720.0, ans=0.125 2024-08-11 22:12:22,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1310820.0, ans=0.0 2024-08-11 22:12:23,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 650, loss[loss=0.1227, beats_loss=0.01, ecapa_loss=0.0001696, whisper_loss=0.111, over 18509.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0109, ecapa_loss=0.0001842, whisper_loss=0.09263, over 3717111.02 frames. ], batch size: 73, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:12:37,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-08-11 22:12:38,071 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-11 22:12:49,556 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-11 22:12:54,877 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-11 22:13:00,485 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.317e+02 2024-08-11 22:13:04,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.576e+01 2.785e+01 3.016e+01 3.995e+01, threshold=5.570e+01, percent-clipped=0.0 2024-08-11 22:13:09,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2024-08-11 22:13:14,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1311120.0, ans=0.0 2024-08-11 22:13:40,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 700, loss[loss=0.1232, beats_loss=0.01108, ecapa_loss=0.0001963, whisper_loss=0.1101, over 14945.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01092, ecapa_loss=0.0001859, whisper_loss=0.09246, over 3710657.45 frames. ], batch size: 59, lr: 6.47e-03, grad_scale: 5.764607523034235e+17 2024-08-11 22:13:59,061 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 22:14:03,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1311420.0, ans=0.0 2024-08-11 22:14:03,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1311420.0, ans=0.125 2024-08-11 22:14:30,765 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-11 22:14:45,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1311720.0, ans=0.04949747468305833 2024-08-11 22:14:54,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1311720.0, ans=0.0 2024-08-11 22:15:00,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 750, loss[loss=0.09398, beats_loss=0.01379, ecapa_loss=0.0001492, whisper_loss=0.0787, over 20721.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01098, ecapa_loss=0.0001837, whisper_loss=0.0925, over 3741475.90 frames. ], batch size: 84, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:15:36,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1312020.0, ans=0.05 2024-08-11 22:15:41,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.583e+01 2.835e+01 3.303e+01 6.155e+01, threshold=5.670e+01, percent-clipped=2.0 2024-08-11 22:15:57,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1312120.0, ans=0.125 2024-08-11 22:15:59,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2024-08-11 22:16:17,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 800, loss[loss=0.1122, beats_loss=0.01143, ecapa_loss=0.0001749, whisper_loss=0.09904, over 22688.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01096, ecapa_loss=0.0001838, whisper_loss=0.09238, over 3779650.78 frames. ], batch size: 87, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:16:32,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1312320.0, ans=0.125 2024-08-11 22:16:34,245 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-11 22:16:36,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-11 22:16:57,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1312520.0, ans=0.125 2024-08-11 22:17:06,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1312520.0, ans=0.0 2024-08-11 22:17:10,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-08-11 22:17:13,790 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-11 22:17:15,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1312620.0, ans=0.5 2024-08-11 22:17:16,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-11 22:17:25,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1312620.0, ans=0.0 2024-08-11 22:17:52,218 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 850, loss[loss=0.0859, beats_loss=0.01321, ecapa_loss=0.0001616, whisper_loss=0.07108, over 16695.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01093, ecapa_loss=0.000184, whisper_loss=0.09194, over 3788621.81 frames. ], batch size: 66, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:17:57,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=22.5 2024-08-11 22:18:02,414 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-11 22:18:05,580 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-11 22:18:40,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2024-08-11 22:18:41,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.539e+01 2.872e+01 3.296e+01 5.215e+01, threshold=5.743e+01, percent-clipped=0.0 2024-08-11 22:18:55,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1313120.0, ans=0.125 2024-08-11 22:19:06,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1313220.0, ans=0.125 2024-08-11 22:19:25,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 900, loss[loss=0.09161, beats_loss=0.009653, ecapa_loss=0.0001766, whisper_loss=0.08019, over 22679.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001826, whisper_loss=0.09131, over 3780250.09 frames. ], batch size: 89, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:19:44,255 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-11 22:20:01,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1313420.0, ans=0.1 2024-08-11 22:20:13,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1313520.0, ans=0.125 2024-08-11 22:20:48,602 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-11 22:21:02,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 950, loss[loss=0.1028, beats_loss=0.01261, ecapa_loss=0.000152, whisper_loss=0.0887, over 22743.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001814, whisper_loss=0.09172, over 3806809.92 frames. ], batch size: 91, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:21:08,451 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-11 22:21:12,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-11 22:21:19,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1313920.0, ans=0.125 2024-08-11 22:21:25,975 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 22:21:36,689 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-11 22:21:36,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1314020.0, ans=0.125 2024-08-11 22:21:51,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.622e+01 2.859e+01 3.329e+01 4.580e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-11 22:22:02,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2024-08-11 22:22:17,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1314220.0, ans=0.0 2024-08-11 22:22:34,432 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1000, loss[loss=0.09699, beats_loss=0.01095, ecapa_loss=0.0001956, whisper_loss=0.08408, over 17723.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01103, ecapa_loss=0.0001809, whisper_loss=0.0918, over 3791180.53 frames. ], batch size: 71, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:22:48,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1314320.0, ans=0.125 2024-08-11 22:22:50,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1314320.0, ans=0.0 2024-08-11 22:22:55,239 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-11 22:23:47,281 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-11 22:23:52,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1314720.0, ans=0.035 2024-08-11 22:24:00,281 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 22:24:01,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1050, loss[loss=0.1001, beats_loss=0.01486, ecapa_loss=0.000158, whisper_loss=0.08364, over 22493.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01102, ecapa_loss=0.0001815, whisper_loss=0.09152, over 3779114.12 frames. ], batch size: 89, lr: 6.47e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:24:12,817 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 22:24:17,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=22.5 2024-08-11 22:24:19,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1314920.0, ans=0.04949747468305833 2024-08-11 22:24:27,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1315020.0, ans=0.1 2024-08-11 22:24:38,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.421e+01 2.684e+01 3.099e+01 9.894e+01, threshold=5.368e+01, percent-clipped=2.0 2024-08-11 22:24:40,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1315020.0, ans=0.025 2024-08-11 22:24:56,392 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 13 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-11 22:24:59,002 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-11 22:25:02,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-11 22:25:09,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1100, loss[loss=0.09159, beats_loss=0.01256, ecapa_loss=0.000214, whisper_loss=0.07689, over 21844.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01105, ecapa_loss=0.0001804, whisper_loss=0.09135, over 3782174.57 frames. ], batch size: 93, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:25:19,602 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.307e+00 2024-08-11 22:25:32,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1315420.0, ans=0.1 2024-08-11 22:25:33,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1315420.0, ans=0.125 2024-08-11 22:25:39,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=1315520.0, ans=0.02 2024-08-11 22:25:56,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1315620.0, ans=0.2 2024-08-11 22:26:02,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1315720.0, ans=0.0 2024-08-11 22:26:04,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1315720.0, ans=0.125 2024-08-11 22:26:10,832 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-11 22:26:17,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1150, loss[loss=0.07485, beats_loss=0.01178, ecapa_loss=0.0001587, whisper_loss=0.06148, over 16723.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01103, ecapa_loss=0.0001804, whisper_loss=0.09209, over 3806804.90 frames. ], batch size: 67, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:26:22,314 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-11 22:26:25,011 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 22:26:30,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1315920.0, ans=0.125 2024-08-11 22:26:34,484 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-11 22:26:39,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-08-11 22:26:54,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.627e+01 2.933e+01 3.282e+01 4.582e+01, threshold=5.866e+01, percent-clipped=0.0 2024-08-11 22:27:05,192 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 22:27:17,120 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 42 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-11 22:27:24,268 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 22:27:26,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1200, loss[loss=0.102, beats_loss=0.01178, ecapa_loss=0.0001804, whisper_loss=0.08844, over 16269.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01107, ecapa_loss=0.0001795, whisper_loss=0.09177, over 3800680.82 frames. ], batch size: 64, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:27:30,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1316320.0, ans=0.2 2024-08-11 22:28:03,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-11 22:28:13,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=12.0 2024-08-11 22:28:19,324 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-11 22:28:24,705 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-11 22:28:35,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-11 22:28:35,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1250, loss[loss=0.1016, beats_loss=0.008715, ecapa_loss=0.0001349, whisper_loss=0.09153, over 16765.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01118, ecapa_loss=0.0001795, whisper_loss=0.09092, over 3770565.02 frames. ], batch size: 59, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:28:47,808 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-11 22:29:11,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.416e+01 2.652e+01 2.971e+01 4.212e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-11 22:29:35,513 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.145e+00 2024-08-11 22:29:36,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1317220.0, ans=0.125 2024-08-11 22:29:43,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1300, loss[loss=0.1259, beats_loss=0.006997, ecapa_loss=0.0001615, whisper_loss=0.1172, over 17450.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01125, ecapa_loss=0.0001772, whisper_loss=0.09097, over 3787789.21 frames. ], batch size: 62, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:29:49,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1317320.0, ans=0.2 2024-08-11 22:29:58,565 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 22:30:00,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2024-08-11 22:30:06,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-08-11 22:30:08,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1317420.0, ans=0.125 2024-08-11 22:30:28,788 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 22:30:41,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1317720.0, ans=0.125 2024-08-11 22:30:42,331 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-11 22:30:46,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1317720.0, ans=0.0 2024-08-11 22:30:51,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1350, loss[loss=0.1344, beats_loss=0.009282, ecapa_loss=0.0001519, whisper_loss=0.1236, over 24251.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01122, ecapa_loss=0.0001761, whisper_loss=0.09072, over 3807188.90 frames. ], batch size: 89, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:30:55,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-11 22:30:58,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1317820.0, ans=0.0 2024-08-11 22:31:04,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1317920.0, ans=0.125 2024-08-11 22:31:18,373 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 22:31:22,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1318020.0, ans=0.125 2024-08-11 22:31:28,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.605e+01 2.891e+01 3.294e+01 5.251e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-11 22:31:36,422 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 22:31:49,819 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-11 22:32:00,651 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1400, loss[loss=0.1019, beats_loss=0.01004, ecapa_loss=0.0001488, whisper_loss=0.09039, over 15657.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01117, ecapa_loss=0.0001771, whisper_loss=0.09079, over 3799942.11 frames. ], batch size: 58, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:32:00,885 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-11 22:32:15,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2024-08-11 22:32:17,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1318420.0, ans=0.0 2024-08-11 22:32:32,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1318520.0, ans=0.0 2024-08-11 22:32:33,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1318520.0, ans=0.035 2024-08-11 22:32:36,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1318520.0, ans=0.0 2024-08-11 22:32:38,601 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-11 22:32:42,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.25 vs. limit=15.0 2024-08-11 22:32:45,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=22.5 2024-08-11 22:32:49,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1318620.0, ans=0.0 2024-08-11 22:33:00,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1318720.0, ans=0.1 2024-08-11 22:33:11,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1450, loss[loss=0.09051, beats_loss=0.01307, ecapa_loss=0.0001725, whisper_loss=0.07571, over 17805.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01118, ecapa_loss=0.0001782, whisper_loss=0.09042, over 3798109.20 frames. ], batch size: 72, lr: 6.46e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:33:38,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=12.0 2024-08-11 22:34:12,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.426e+01 2.679e+01 3.124e+01 8.618e+01, threshold=5.357e+01, percent-clipped=1.0 2024-08-11 22:34:16,123 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 12 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 22:34:20,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1319120.0, ans=0.0 2024-08-11 22:34:34,151 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 22:34:34,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1319220.0, ans=0.125 2024-08-11 22:34:45,956 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1500, loss[loss=0.1143, beats_loss=0.01156, ecapa_loss=0.0001629, whisper_loss=0.1011, over 20343.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01122, ecapa_loss=0.0001781, whisper_loss=0.09032, over 3820283.97 frames. ], batch size: 78, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:34:46,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1319320.0, ans=0.125 2024-08-11 22:34:51,107 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 22:35:05,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1319420.0, ans=0.125 2024-08-11 22:35:18,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1319520.0, ans=0.09899494936611666 2024-08-11 22:35:18,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-11 22:35:33,052 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-11 22:35:33,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-08-11 22:35:40,176 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-11 22:35:45,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1319720.0, ans=0.0 2024-08-11 22:35:54,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=15.0 2024-08-11 22:35:55,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1319720.0, ans=0.125 2024-08-11 22:35:57,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1550, loss[loss=0.1422, beats_loss=0.008763, ecapa_loss=0.0001671, whisper_loss=0.1318, over 16681.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01116, ecapa_loss=0.0001774, whisper_loss=0.09093, over 3792325.56 frames. ], batch size: 61, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:35:58,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-08-11 22:36:06,204 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 22:36:06,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1319820.0, ans=0.025 2024-08-11 22:36:06,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1319820.0, ans=0.1 2024-08-11 22:36:17,720 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-11 22:36:21,771 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-132000.pt 2024-08-11 22:36:28,534 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-11 22:36:34,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1320020.0, ans=0.125 2024-08-11 22:36:37,882 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.581e+01 2.864e+01 3.252e+01 1.978e+02, threshold=5.728e+01, percent-clipped=3.0 2024-08-11 22:36:39,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2024-08-11 22:36:40,803 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-11 22:36:42,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-08-11 22:36:49,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1320120.0, ans=0.2 2024-08-11 22:36:57,460 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-11 22:37:09,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1600, loss[loss=0.131, beats_loss=0.008911, ecapa_loss=0.0001301, whisper_loss=0.1208, over 18634.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0111, ecapa_loss=0.0001778, whisper_loss=0.09125, over 3800426.88 frames. ], batch size: 67, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:37:11,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1320320.0, ans=0.0 2024-08-11 22:37:21,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1320320.0, ans=0.2 2024-08-11 22:37:29,234 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-11 22:37:33,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1320420.0, ans=0.125 2024-08-11 22:37:44,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.87 vs. limit=22.5 2024-08-11 22:37:58,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1320620.0, ans=0.125 2024-08-11 22:38:07,218 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 22:38:17,409 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1650, loss[loss=0.1388, beats_loss=0.009068, ecapa_loss=0.0001604, whisper_loss=0.1281, over 23521.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01112, ecapa_loss=0.0001771, whisper_loss=0.09116, over 3785075.87 frames. ], batch size: 87, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:38:29,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1320820.0, ans=0.0 2024-08-11 22:38:32,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1320920.0, ans=0.0 2024-08-11 22:38:32,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.90 vs. limit=22.5 2024-08-11 22:38:46,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=12.0 2024-08-11 22:38:55,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.537e+01 2.816e+01 3.147e+01 5.584e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-11 22:39:27,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1700, loss[loss=0.1065, beats_loss=0.007046, ecapa_loss=0.0002152, whisper_loss=0.09731, over 17290.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01106, ecapa_loss=0.000176, whisper_loss=0.09125, over 3771680.46 frames. ], batch size: 67, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:39:41,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1321420.0, ans=0.125 2024-08-11 22:39:49,157 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 22:39:49,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1321420.0, ans=0.035 2024-08-11 22:39:51,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1321420.0, ans=0.2 2024-08-11 22:40:02,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1321520.0, ans=0.1 2024-08-11 22:40:11,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1321620.0, ans=0.125 2024-08-11 22:40:18,729 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 8 from Vox, 38 fro AS 2024-08-11 22:40:35,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-11 22:40:35,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1750, loss[loss=0.08189, beats_loss=0.01063, ecapa_loss=0.0001698, whisper_loss=0.06956, over 20423.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01097, ecapa_loss=0.0001758, whisper_loss=0.09168, over 3809097.07 frames. ], batch size: 83, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:40:42,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1321820.0, ans=0.125 2024-08-11 22:41:01,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1322020.0, ans=0.125 2024-08-11 22:41:04,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1322020.0, ans=0.125 2024-08-11 22:41:06,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-11 22:41:12,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.558e+01 2.941e+01 3.375e+01 5.382e+01, threshold=5.883e+01, percent-clipped=0.0 2024-08-11 22:41:17,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1322120.0, ans=0.125 2024-08-11 22:41:38,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1322220.0, ans=0.2 2024-08-11 22:41:45,282 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1800, loss[loss=0.1183, beats_loss=0.01085, ecapa_loss=0.0001723, whisper_loss=0.1058, over 21616.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01093, ecapa_loss=0.0001759, whisper_loss=0.09162, over 3838251.66 frames. ], batch size: 86, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:42:01,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-08-11 22:42:04,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=12.0 2024-08-11 22:42:18,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1322520.0, ans=0.1 2024-08-11 22:42:29,188 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 22:42:55,668 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-11 22:42:58,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1850, loss[loss=0.1204, beats_loss=0.009922, ecapa_loss=0.0001935, whisper_loss=0.1085, over 23874.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01084, ecapa_loss=0.0001773, whisper_loss=0.09217, over 3849449.30 frames. ], batch size: 93, lr: 6.45e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:43:21,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1322920.0, ans=0.125 2024-08-11 22:43:25,810 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-11 22:43:36,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1323020.0, ans=0.95 2024-08-11 22:43:39,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.593e+01 2.917e+01 3.347e+01 7.328e+01, threshold=5.834e+01, percent-clipped=1.0 2024-08-11 22:43:50,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1323120.0, ans=0.0 2024-08-11 22:44:05,707 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 21 from Vox, 15 fro AS 2024-08-11 22:44:05,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1323220.0, ans=0.1 2024-08-11 22:44:12,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1900, loss[loss=0.1023, beats_loss=0.01144, ecapa_loss=0.0002043, whisper_loss=0.08879, over 22565.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001792, whisper_loss=0.09142, over 3820540.90 frames. ], batch size: 94, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:44:12,611 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-11 22:44:16,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1323320.0, ans=0.1 2024-08-11 22:44:19,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-11 22:44:27,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1323420.0, ans=0.125 2024-08-11 22:44:37,489 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-11 22:44:50,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-08-11 22:44:56,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1323620.0, ans=0.025 2024-08-11 22:44:57,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1323620.0, ans=0.0 2024-08-11 22:45:24,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 1950, loss[loss=0.08679, beats_loss=0.01422, ecapa_loss=0.0001415, whisper_loss=0.07116, over 16273.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01107, ecapa_loss=0.0001794, whisper_loss=0.09106, over 3822290.50 frames. ], batch size: 63, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:45:35,645 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-11 22:45:50,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1323920.0, ans=0.125 2024-08-11 22:45:54,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1324020.0, ans=0.0 2024-08-11 22:46:02,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.530e+01 2.923e+01 3.581e+01 1.963e+02, threshold=5.846e+01, percent-clipped=3.0 2024-08-11 22:46:18,780 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 22 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-11 22:46:28,957 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 14 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-11 22:46:30,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1324220.0, ans=0.125 2024-08-11 22:46:36,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2000, loss[loss=0.09673, beats_loss=0.01105, ecapa_loss=0.0002044, whisper_loss=0.08364, over 22093.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01113, ecapa_loss=0.0001803, whisper_loss=0.09063, over 3799957.50 frames. ], batch size: 90, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:46:41,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1324320.0, ans=0.0 2024-08-11 22:46:57,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1324420.0, ans=0.125 2024-08-11 22:47:02,405 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-11 22:47:07,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1324520.0, ans=0.0 2024-08-11 22:47:22,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1324620.0, ans=0.125 2024-08-11 22:47:29,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2024-08-11 22:47:31,876 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 22:47:32,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1324620.0, ans=0.125 2024-08-11 22:47:33,227 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-11 22:47:37,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1324720.0, ans=0.0 2024-08-11 22:47:51,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2050, loss[loss=0.1068, beats_loss=0.01229, ecapa_loss=0.000135, whisper_loss=0.09313, over 19333.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01115, ecapa_loss=0.0001812, whisper_loss=0.09099, over 3815509.42 frames. ], batch size: 72, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:48:26,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1325020.0, ans=0.125 2024-08-11 22:48:30,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.631e+01 3.014e+01 3.370e+01 4.766e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 22:48:39,686 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-11 22:48:46,504 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-11 22:49:03,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2100, loss[loss=0.1181, beats_loss=0.01169, ecapa_loss=0.0001873, whisper_loss=0.1046, over 22024.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01115, ecapa_loss=0.0001813, whisper_loss=0.09069, over 3792416.97 frames. ], batch size: 87, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:49:41,717 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-11 22:50:06,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2024-08-11 22:50:09,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1325720.0, ans=0.125 2024-08-11 22:50:11,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1325720.0, ans=0.0 2024-08-11 22:50:13,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1325720.0, ans=0.0 2024-08-11 22:50:18,673 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2150, loss[loss=0.1173, beats_loss=0.009299, ecapa_loss=0.0002012, whisper_loss=0.1059, over 22013.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0112, ecapa_loss=0.000182, whisper_loss=0.09091, over 3799201.43 frames. ], batch size: 88, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:50:21,648 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-11 22:50:23,484 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 22:50:24,686 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 10 from Vox, 44 fro AS 2024-08-11 22:50:30,046 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-11 22:50:31,264 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-11 22:50:37,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1325920.0, ans=0.1 2024-08-11 22:50:38,590 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-11 22:50:39,998 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 22:50:53,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1326020.0, ans=0.025 2024-08-11 22:50:56,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=8.0 2024-08-11 22:50:57,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.560e+01 2.828e+01 3.267e+01 5.795e+01, threshold=5.656e+01, percent-clipped=0.0 2024-08-11 22:50:57,572 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-11 22:51:07,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1326120.0, ans=0.125 2024-08-11 22:51:14,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2024-08-11 22:51:31,052 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2200, loss[loss=0.1077, beats_loss=0.01093, ecapa_loss=0.0002022, whisper_loss=0.0947, over 21970.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01122, ecapa_loss=0.0001814, whisper_loss=0.0915, over 3827924.14 frames. ], batch size: 89, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:51:39,617 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-11 22:51:41,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-08-11 22:52:17,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1326620.0, ans=0.125 2024-08-11 22:52:21,972 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 22:52:32,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-11 22:52:36,834 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-11 22:52:44,234 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2250, loss[loss=0.1048, beats_loss=0.01346, ecapa_loss=0.0001584, whisper_loss=0.08971, over 23154.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01128, ecapa_loss=0.0001832, whisper_loss=0.09119, over 3853439.58 frames. ], batch size: 90, lr: 6.44e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:52:52,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1326820.0, ans=0.0 2024-08-11 22:53:18,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1327020.0, ans=0.0 2024-08-11 22:53:24,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.654e+01 2.933e+01 3.292e+01 6.746e+01, threshold=5.867e+01, percent-clipped=1.0 2024-08-11 22:53:48,400 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-11 22:53:48,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1327220.0, ans=0.125 2024-08-11 22:53:55,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1327220.0, ans=0.125 2024-08-11 22:53:56,999 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-11 22:53:58,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2300, loss[loss=0.1178, beats_loss=0.008043, ecapa_loss=0.0002202, whisper_loss=0.1076, over 20720.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01133, ecapa_loss=0.0001831, whisper_loss=0.091, over 3850836.04 frames. ], batch size: 84, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:53:58,367 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 33 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-11 22:54:05,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1327320.0, ans=0.1 2024-08-11 22:54:12,933 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-11 22:55:12,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2350, loss[loss=0.1079, beats_loss=0.009375, ecapa_loss=0.0001827, whisper_loss=0.09675, over 18535.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01122, ecapa_loss=0.0001838, whisper_loss=0.09113, over 3827956.62 frames. ], batch size: 71, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:55:14,488 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 22:55:16,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1327820.0, ans=0.2 2024-08-11 22:55:23,365 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-11 22:55:29,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-11 22:55:30,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1327920.0, ans=0.0 2024-08-11 22:55:33,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1327920.0, ans=0.125 2024-08-11 22:55:36,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1327920.0, ans=0.1 2024-08-11 22:55:52,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.549e+01 2.872e+01 3.307e+01 6.850e+01, threshold=5.744e+01, percent-clipped=1.0 2024-08-11 22:55:56,853 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-11 22:56:25,746 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2400, loss[loss=0.1069, beats_loss=0.01072, ecapa_loss=0.0001867, whisper_loss=0.09432, over 22394.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.000185, whisper_loss=0.09157, over 3873564.37 frames. ], batch size: 88, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:56:33,154 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-11 22:56:34,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1328320.0, ans=0.0 2024-08-11 22:56:46,401 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-11 22:56:46,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1328420.0, ans=0.125 2024-08-11 22:56:53,511 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-11 22:57:01,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1328520.0, ans=0.125 2024-08-11 22:57:07,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1328520.0, ans=0.125 2024-08-11 22:57:11,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1328620.0, ans=0.0 2024-08-11 22:57:22,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1328620.0, ans=0.5 2024-08-11 22:57:31,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1328720.0, ans=0.125 2024-08-11 22:57:41,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2450, loss[loss=0.09688, beats_loss=0.01055, ecapa_loss=0.000188, whisper_loss=0.08445, over 19577.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01114, ecapa_loss=0.000184, whisper_loss=0.09222, over 3867163.94 frames. ], batch size: 79, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:57:44,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-08-11 22:57:49,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1328820.0, ans=0.125 2024-08-11 22:58:03,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1328920.0, ans=0.0 2024-08-11 22:58:04,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1328920.0, ans=0.0 2024-08-11 22:58:06,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-11 22:58:16,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1329020.0, ans=0.125 2024-08-11 22:58:16,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1329020.0, ans=0.125 2024-08-11 22:58:16,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1329020.0, ans=0.0 2024-08-11 22:58:19,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1329020.0, ans=0.125 2024-08-11 22:58:21,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.108e+01 2.487e+01 2.806e+01 3.226e+01 5.199e+01, threshold=5.611e+01, percent-clipped=0.0 2024-08-11 22:58:26,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1329120.0, ans=0.09899494936611666 2024-08-11 22:58:31,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2024-08-11 22:58:32,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=1329120.0, ans=15.0 2024-08-11 22:58:56,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2500, loss[loss=0.1065, beats_loss=0.0103, ecapa_loss=0.0002351, whisper_loss=0.09384, over 21559.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01116, ecapa_loss=0.0001852, whisper_loss=0.09175, over 3886868.14 frames. ], batch size: 89, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 22:58:59,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1329320.0, ans=0.0 2024-08-11 22:59:06,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1329320.0, ans=0.125 2024-08-11 22:59:09,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1329420.0, ans=0.125 2024-08-11 22:59:10,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-08-11 22:59:13,246 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-11 22:59:24,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-08-11 22:59:52,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1329620.0, ans=0.07 2024-08-11 22:59:52,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=15.0 2024-08-11 22:59:59,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-11 23:00:08,102 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-11 23:00:14,387 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2550, loss[loss=0.09794, beats_loss=0.01037, ecapa_loss=0.0001627, whisper_loss=0.08595, over 15067.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01116, ecapa_loss=0.0001845, whisper_loss=0.0919, over 3859701.06 frames. ], batch size: 60, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:00:27,837 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-11 23:00:28,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=22.5 2024-08-11 23:00:34,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1329920.0, ans=0.1 2024-08-11 23:00:35,506 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-11 23:00:47,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1330020.0, ans=0.09899494936611666 2024-08-11 23:00:57,056 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.600e+01 2.873e+01 3.308e+01 4.841e+01, threshold=5.745e+01, percent-clipped=0.0 2024-08-11 23:01:23,992 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-11 23:01:27,030 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-11 23:01:27,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1330220.0, ans=0.2 2024-08-11 23:01:34,684 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2600, loss[loss=0.09532, beats_loss=0.01076, ecapa_loss=0.0001704, whisper_loss=0.08285, over 16224.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01115, ecapa_loss=0.0001846, whisper_loss=0.09212, over 3868578.34 frames. ], batch size: 62, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:01:49,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=12.0 2024-08-11 23:02:05,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2024-08-11 23:02:14,355 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 23:02:18,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1330520.0, ans=0.125 2024-08-11 23:02:24,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1330620.0, ans=0.5 2024-08-11 23:02:28,809 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-11 23:02:35,102 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 23:02:38,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1330720.0, ans=0.02 2024-08-11 23:02:47,198 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 10 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-11 23:02:47,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1330720.0, ans=0.1 2024-08-11 23:02:51,792 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2650, loss[loss=0.1109, beats_loss=0.01036, ecapa_loss=0.000196, whisper_loss=0.09859, over 21914.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01123, ecapa_loss=0.0001854, whisper_loss=0.09149, over 3871896.23 frames. ], batch size: 85, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:02:52,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1330820.0, ans=0.125 2024-08-11 23:02:53,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=1330820.0, ans=0.02 2024-08-11 23:03:13,057 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-11 23:03:24,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1331020.0, ans=0.0 2024-08-11 23:03:26,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-08-11 23:03:27,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1331020.0, ans=0.125 2024-08-11 23:03:34,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1331020.0, ans=0.125 2024-08-11 23:03:35,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.699e+01 2.993e+01 3.555e+01 9.155e+01, threshold=5.987e+01, percent-clipped=1.0 2024-08-11 23:03:49,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1331120.0, ans=0.125 2024-08-11 23:04:12,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331320.0, ans=0.1 2024-08-11 23:04:13,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2700, loss[loss=0.09086, beats_loss=0.01317, ecapa_loss=0.0001621, whisper_loss=0.07607, over 20933.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01121, ecapa_loss=0.0001846, whisper_loss=0.09182, over 3870961.31 frames. ], batch size: 83, lr: 6.43e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:04:20,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-11 23:04:37,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1331420.0, ans=0.1 2024-08-11 23:04:46,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1331520.0, ans=0.125 2024-08-11 23:04:49,352 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 35 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 23:04:55,899 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 23:05:01,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1331620.0, ans=0.125 2024-08-11 23:05:05,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-11 23:05:32,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2750, loss[loss=0.1295, beats_loss=0.006412, ecapa_loss=0.0002258, whisper_loss=0.1209, over 21982.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001852, whisper_loss=0.09199, over 3805784.83 frames. ], batch size: 83, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:05:33,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1331820.0, ans=0.0 2024-08-11 23:05:36,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1331820.0, ans=0.125 2024-08-11 23:05:59,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1331920.0, ans=0.125 2024-08-11 23:06:18,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.646e+01 2.996e+01 3.308e+01 5.705e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-11 23:06:31,608 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 23:06:35,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1332120.0, ans=0.1 2024-08-11 23:06:54,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2800, loss[loss=0.1041, beats_loss=0.01046, ecapa_loss=0.0001681, whisper_loss=0.09194, over 19982.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01124, ecapa_loss=0.0001845, whisper_loss=0.09158, over 3828906.58 frames. ], batch size: 77, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:07:00,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1332320.0, ans=0.1 2024-08-11 23:07:23,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-11 23:07:33,223 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.370e-01 2024-08-11 23:07:40,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1332620.0, ans=0.1 2024-08-11 23:07:50,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1332620.0, ans=0.1 2024-08-11 23:07:52,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1332620.0, ans=0.0 2024-08-11 23:08:08,426 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 23:08:12,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1332820.0, ans=0.125 2024-08-11 23:08:12,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-11 23:08:12,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2850, loss[loss=0.096, beats_loss=0.01174, ecapa_loss=0.0002245, whisper_loss=0.08201, over 20716.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01126, ecapa_loss=0.0001837, whisper_loss=0.09272, over 3873870.01 frames. ], batch size: 88, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:08:16,272 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-11 23:08:33,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1332920.0, ans=0.0 2024-08-11 23:08:43,232 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 23:08:51,567 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-11 23:08:57,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.704e+01 2.991e+01 3.400e+01 6.217e+01, threshold=5.982e+01, percent-clipped=1.0 2024-08-11 23:09:09,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-11 23:09:24,486 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-11 23:09:26,490 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 13 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 23:09:33,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2900, loss[loss=0.106, beats_loss=0.01131, ecapa_loss=0.0001658, whisper_loss=0.09299, over 17915.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01125, ecapa_loss=0.0001845, whisper_loss=0.09234, over 3878120.73 frames. ], batch size: 70, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:09:47,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2024-08-11 23:09:54,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1333420.0, ans=0.125 2024-08-11 23:10:02,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1333420.0, ans=0.125 2024-08-11 23:10:27,354 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-11 23:10:30,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-11 23:10:33,090 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 23:10:33,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-08-11 23:10:43,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1333720.0, ans=0.125 2024-08-11 23:10:54,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 2950, loss[loss=0.1064, beats_loss=0.01161, ecapa_loss=0.0002065, whisper_loss=0.09271, over 21889.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0112, ecapa_loss=0.0001853, whisper_loss=0.09296, over 3901175.99 frames. ], batch size: 91, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:10:56,851 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-11 23:11:04,685 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 23:11:08,346 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-11 23:11:17,791 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-11 23:11:29,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2024-08-11 23:11:34,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1334020.0, ans=0.125 2024-08-11 23:11:40,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.175e+01 2.642e+01 2.977e+01 3.342e+01 4.548e+01, threshold=5.953e+01, percent-clipped=0.0 2024-08-11 23:11:43,844 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-11 23:12:14,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1334320.0, ans=0.0 2024-08-11 23:12:15,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3000, loss[loss=0.09649, beats_loss=0.009537, ecapa_loss=0.0001772, whisper_loss=0.08518, over 14558.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01124, ecapa_loss=0.0001846, whisper_loss=0.09216, over 3889755.54 frames. ], batch size: 53, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:12:15,925 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-11 23:12:58,442 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006225, whisper_loss=0.2505, over 922467.00 frames. 2024-08-11 23:13:14,994 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on SV_voxceleb1: loss=0.004936, beats_loss=0, ecapa_loss=0.0004936, whisper_loss=0, over 939242.00 frames. 2024-08-11 23:15:19,819 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on AT_audioset: loss=0.02462, beats_loss=0.02462, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-11 23:15:19,824 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-11 23:15:27,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2024-08-11 23:15:31,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1334320.0, ans=0.125 2024-08-11 23:15:42,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1334420.0, ans=0.125 2024-08-11 23:15:47,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1334420.0, ans=0.125 2024-08-11 23:15:50,743 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-11 23:15:56,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=15.0 2024-08-11 23:16:30,652 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-11 23:16:39,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3050, loss[loss=0.08508, beats_loss=0.0133, ecapa_loss=0.0001774, whisper_loss=0.07, over 14985.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01116, ecapa_loss=0.0001855, whisper_loss=0.09267, over 3883115.25 frames. ], batch size: 60, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:17:11,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1335020.0, ans=0.1 2024-08-11 23:17:21,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.698e+01 2.929e+01 3.403e+01 4.861e+01, threshold=5.858e+01, percent-clipped=0.0 2024-08-11 23:17:38,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1335220.0, ans=0.2 2024-08-11 23:17:42,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1335220.0, ans=0.0 2024-08-11 23:17:53,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3100, loss[loss=0.0966, beats_loss=0.01089, ecapa_loss=0.000169, whisper_loss=0.08402, over 19304.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0112, ecapa_loss=0.0001854, whisper_loss=0.09245, over 3851648.90 frames. ], batch size: 76, lr: 6.42e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:17:55,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1335320.0, ans=0.125 2024-08-11 23:18:04,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1335320.0, ans=10.0 2024-08-11 23:18:12,260 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 26 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-11 23:18:53,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1335720.0, ans=0.125 2024-08-11 23:18:53,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1335720.0, ans=0.125 2024-08-11 23:19:01,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1335720.0, ans=0.0 2024-08-11 23:19:06,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-11 23:19:10,236 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3150, loss[loss=0.1008, beats_loss=0.009219, ecapa_loss=0.0002114, whisper_loss=0.0895, over 17421.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01124, ecapa_loss=0.000186, whisper_loss=0.09263, over 3842397.65 frames. ], batch size: 67, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:19:17,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1335820.0, ans=0.025 2024-08-11 23:19:28,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1335920.0, ans=0.125 2024-08-11 23:19:31,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1335920.0, ans=0.07 2024-08-11 23:19:31,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2024-08-11 23:19:52,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.066e+01 2.477e+01 2.769e+01 3.279e+01 6.467e+01, threshold=5.538e+01, percent-clipped=1.0 2024-08-11 23:20:13,138 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-11 23:20:19,100 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-11 23:20:24,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3200, loss[loss=0.1048, beats_loss=0.009531, ecapa_loss=0.0002508, whisper_loss=0.0928, over 19319.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01125, ecapa_loss=0.0001853, whisper_loss=0.09307, over 3823853.07 frames. ], batch size: 79, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:20:44,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1336420.0, ans=0.125 2024-08-11 23:20:48,737 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-11 23:20:53,114 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-11 23:20:53,419 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.192e-01 2024-08-11 23:21:27,016 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-11 23:21:37,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3250, loss[loss=0.1104, beats_loss=0.01226, ecapa_loss=0.0001578, whisper_loss=0.09654, over 22531.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01123, ecapa_loss=0.000185, whisper_loss=0.09322, over 3825140.69 frames. ], batch size: 91, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:21:48,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1336820.0, ans=0.0 2024-08-11 23:21:56,329 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-11 23:22:12,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1337020.0, ans=0.125 2024-08-11 23:22:18,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.451e+01 2.863e+01 3.216e+01 4.803e+01, threshold=5.726e+01, percent-clipped=0.0 2024-08-11 23:22:24,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-08-11 23:22:31,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1337120.0, ans=0.1 2024-08-11 23:22:52,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3300, loss[loss=0.1165, beats_loss=0.009102, ecapa_loss=0.0002067, whisper_loss=0.1054, over 16917.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01118, ecapa_loss=0.0001862, whisper_loss=0.09397, over 3839213.18 frames. ], batch size: 67, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:23:28,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-08-11 23:23:40,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=22.5 2024-08-11 23:24:06,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3350, loss[loss=0.1078, beats_loss=0.008505, ecapa_loss=0.0001995, whisper_loss=0.09728, over 17591.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.0111, ecapa_loss=0.0001859, whisper_loss=0.09443, over 3853843.78 frames. ], batch size: 69, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:24:19,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1337920.0, ans=0.0 2024-08-11 23:24:28,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1337920.0, ans=0.0 2024-08-11 23:24:38,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1338020.0, ans=0.125 2024-08-11 23:24:45,858 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.630e+01 2.898e+01 3.415e+01 6.649e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-11 23:24:53,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1338120.0, ans=0.1 2024-08-11 23:25:02,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1338220.0, ans=0.125 2024-08-11 23:25:03,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2024-08-11 23:25:05,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1338220.0, ans=0.125 2024-08-11 23:25:17,462 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3400, loss[loss=0.117, beats_loss=0.01044, ecapa_loss=0.0001608, whisper_loss=0.105, over 22480.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0111, ecapa_loss=0.0001856, whisper_loss=0.09328, over 3845552.26 frames. ], batch size: 87, lr: 6.41e-03, grad_scale: 1.152921504606847e+18 2024-08-11 23:25:19,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1338320.0, ans=0.0 2024-08-11 23:25:20,086 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-11 23:25:26,996 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 23:25:38,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-11 23:25:51,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.41 vs. limit=15.0 2024-08-11 23:26:00,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-08-11 23:26:01,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1338620.0, ans=0.125 2024-08-11 23:26:04,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1338620.0, ans=0.125 2024-08-11 23:26:15,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1338720.0, ans=0.07 2024-08-11 23:26:18,409 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 23:26:20,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1338720.0, ans=0.0 2024-08-11 23:26:27,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3450, loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001786, whisper_loss=0.09111, over 16072.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001876, whisper_loss=0.09278, over 3835875.45 frames. ], batch size: 60, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:26:32,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-08-11 23:26:56,019 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-11 23:26:58,946 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-11 23:27:08,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.641e+01 2.848e+01 3.378e+01 1.355e+02, threshold=5.696e+01, percent-clipped=1.0 2024-08-11 23:27:09,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-11 23:27:18,940 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-11 23:27:25,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2024-08-11 23:27:27,422 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-11 23:27:38,880 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3500, loss[loss=0.09393, beats_loss=0.01117, ecapa_loss=0.000203, whisper_loss=0.08073, over 21693.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001879, whisper_loss=0.09256, over 3857216.38 frames. ], batch size: 90, lr: 6.41e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:27:45,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-11 23:27:49,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1339320.0, ans=0.125 2024-08-11 23:28:11,579 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-11 23:28:18,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1339520.0, ans=0.05 2024-08-11 23:28:28,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1339620.0, ans=0.125 2024-08-11 23:28:50,359 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3550, loss[loss=0.1205, beats_loss=0.01127, ecapa_loss=0.0001475, whisper_loss=0.1077, over 16699.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01122, ecapa_loss=0.0001859, whisper_loss=0.09203, over 3870082.35 frames. ], batch size: 62, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:29:00,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1339820.0, ans=0.125 2024-08-11 23:29:11,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1339920.0, ans=0.125 2024-08-11 23:29:20,061 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 13 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-11 23:29:32,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.587e+01 2.900e+01 3.239e+01 4.496e+01, threshold=5.800e+01, percent-clipped=0.0 2024-08-11 23:29:32,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1340120.0, ans=0.1 2024-08-11 23:29:48,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1340220.0, ans=0.125 2024-08-11 23:29:50,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1340220.0, ans=0.0 2024-08-11 23:29:53,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1340220.0, ans=0.2 2024-08-11 23:29:55,163 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-11 23:29:56,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.76 vs. limit=15.0 2024-08-11 23:30:06,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3600, loss[loss=0.1155, beats_loss=0.01135, ecapa_loss=0.0001608, whisper_loss=0.1026, over 19629.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01115, ecapa_loss=0.0001877, whisper_loss=0.09271, over 3871693.53 frames. ], batch size: 76, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:30:18,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.24 vs. limit=15.0 2024-08-11 23:30:21,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=1340420.0, ans=0.1 2024-08-11 23:30:23,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1340420.0, ans=0.125 2024-08-11 23:30:33,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1340420.0, ans=0.2 2024-08-11 23:30:40,865 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-11 23:30:56,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2024-08-11 23:31:23,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3650, loss[loss=0.1001, beats_loss=0.01208, ecapa_loss=0.0001598, whisper_loss=0.08646, over 14416.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01111, ecapa_loss=0.0001875, whisper_loss=0.09263, over 3858001.72 frames. ], batch size: 55, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:31:23,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1340820.0, ans=0.2 2024-08-11 23:31:40,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1340920.0, ans=0.2 2024-08-11 23:31:41,534 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 23:31:53,336 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.363e+00 2024-08-11 23:31:58,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-11 23:31:59,423 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-11 23:32:08,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.076e+01 2.620e+01 2.825e+01 3.170e+01 6.141e+01, threshold=5.649e+01, percent-clipped=1.0 2024-08-11 23:32:15,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1341120.0, ans=0.0 2024-08-11 23:32:20,217 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-11 23:32:20,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1341120.0, ans=0.2 2024-08-11 23:32:28,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1341220.0, ans=0.125 2024-08-11 23:32:31,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1341220.0, ans=0.0 2024-08-11 23:32:41,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3700, loss[loss=0.09065, beats_loss=0.01378, ecapa_loss=0.0001595, whisper_loss=0.07528, over 14212.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001881, whisper_loss=0.09247, over 3852596.91 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:32:55,906 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-11 23:33:08,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1341420.0, ans=0.125 2024-08-11 23:33:09,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1341420.0, ans=0.035 2024-08-11 23:33:30,342 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-11 23:33:30,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=24.17 vs. limit=22.5 2024-08-11 23:33:57,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3750, loss[loss=0.1352, beats_loss=0.009562, ecapa_loss=0.000154, whisper_loss=0.1241, over 16159.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0112, ecapa_loss=0.000187, whisper_loss=0.09266, over 3853424.19 frames. ], batch size: 57, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:34:00,539 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-11 23:34:08,584 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.171e+02 2024-08-11 23:34:22,267 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-11 23:34:38,808 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-11 23:34:41,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.517e+01 2.756e+01 3.054e+01 4.813e+01, threshold=5.513e+01, percent-clipped=0.0 2024-08-11 23:34:41,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1342120.0, ans=0.0 2024-08-11 23:34:55,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=15.0 2024-08-11 23:34:56,261 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 7 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-11 23:34:59,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1342220.0, ans=0.125 2024-08-11 23:35:03,285 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-11 23:35:11,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3800, loss[loss=0.09174, beats_loss=0.01182, ecapa_loss=0.0001714, whisper_loss=0.07821, over 16039.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01134, ecapa_loss=0.0001867, whisper_loss=0.09228, over 3873141.18 frames. ], batch size: 65, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:35:13,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1342320.0, ans=0.125 2024-08-11 23:35:16,942 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-11 23:36:11,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1342720.0, ans=0.0 2024-08-11 23:36:12,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1342720.0, ans=0.0 2024-08-11 23:36:24,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=15.0 2024-08-11 23:36:25,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3850, loss[loss=0.07382, beats_loss=0.01547, ecapa_loss=0.0001394, whisper_loss=0.05696, over 22680.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0114, ecapa_loss=0.0001856, whisper_loss=0.09144, over 3851247.83 frames. ], batch size: 93, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:36:26,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-08-11 23:36:30,727 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-11 23:36:42,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1342920.0, ans=0.125 2024-08-11 23:36:48,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1342920.0, ans=0.125 2024-08-11 23:37:04,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+01 2.584e+01 2.893e+01 3.554e+01 5.203e+01, threshold=5.787e+01, percent-clipped=0.0 2024-08-11 23:37:06,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1343120.0, ans=22.5 2024-08-11 23:37:17,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1343120.0, ans=0.125 2024-08-11 23:37:22,445 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-11 23:37:24,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1343220.0, ans=0.04949747468305833 2024-08-11 23:37:33,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3900, loss[loss=0.1183, beats_loss=0.008668, ecapa_loss=0.0002002, whisper_loss=0.1076, over 21141.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01141, ecapa_loss=0.0001857, whisper_loss=0.09151, over 3892433.35 frames. ], batch size: 87, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:37:38,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1343320.0, ans=0.125 2024-08-11 23:37:39,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1343320.0, ans=0.125 2024-08-11 23:37:54,456 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 23:37:57,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1343420.0, ans=0.125 2024-08-11 23:38:30,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1343720.0, ans=0.125 2024-08-11 23:38:33,039 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 15 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-11 23:38:42,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 3950, loss[loss=0.127, beats_loss=0.008433, ecapa_loss=0.0002346, whisper_loss=0.1162, over 21065.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01133, ecapa_loss=0.0001867, whisper_loss=0.0925, over 3910119.15 frames. ], batch size: 86, lr: 6.40e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:38:53,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1343820.0, ans=0.5 2024-08-11 23:38:59,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.40 vs. limit=10.0 2024-08-11 23:39:00,139 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-11 23:39:03,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1343920.0, ans=0.05 2024-08-11 23:39:14,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1344020.0, ans=0.125 2024-08-11 23:39:17,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1344020.0, ans=0.1 2024-08-11 23:39:22,170 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.065e+01 2.690e+01 2.958e+01 3.481e+01 5.578e+01, threshold=5.915e+01, percent-clipped=0.0 2024-08-11 23:39:23,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1344120.0, ans=0.0 2024-08-11 23:39:33,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1344120.0, ans=0.0 2024-08-11 23:39:51,047 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4000, loss[loss=0.1086, beats_loss=0.0106, ecapa_loss=0.0001843, whisper_loss=0.0962, over 23217.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01122, ecapa_loss=0.0001877, whisper_loss=0.09281, over 3901474.04 frames. ], batch size: 94, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:40:08,369 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-11 23:40:15,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1344420.0, ans=0.1 2024-08-11 23:40:15,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1344420.0, ans=0.0 2024-08-11 23:41:00,534 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4050, loss[loss=0.106, beats_loss=0.01159, ecapa_loss=0.0001729, whisper_loss=0.09272, over 20225.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01124, ecapa_loss=0.0001866, whisper_loss=0.09254, over 3886049.43 frames. ], batch size: 81, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:41:26,720 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-11 23:41:39,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.615e+01 3.014e+01 3.367e+01 5.886e+01, threshold=6.027e+01, percent-clipped=0.0 2024-08-11 23:41:40,077 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 33 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-11 23:41:41,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1345120.0, ans=0.125 2024-08-11 23:42:08,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4100, loss[loss=0.1021, beats_loss=0.01472, ecapa_loss=0.0001865, whisper_loss=0.08551, over 19573.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01124, ecapa_loss=0.0001865, whisper_loss=0.09273, over 3891117.44 frames. ], batch size: 82, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:42:11,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1345320.0, ans=0.0 2024-08-11 23:42:27,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1345420.0, ans=0.125 2024-08-11 23:42:36,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1345520.0, ans=0.0 2024-08-11 23:42:40,944 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-11 23:42:49,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1345620.0, ans=0.025 2024-08-11 23:42:53,445 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-11 23:42:55,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-11 23:43:14,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1345720.0, ans=0.2 2024-08-11 23:43:18,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4150, loss[loss=0.1038, beats_loss=0.01401, ecapa_loss=0.0001284, whisper_loss=0.08851, over 23836.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01128, ecapa_loss=0.0001863, whisper_loss=0.09237, over 3880705.06 frames. ], batch size: 91, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:43:21,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1345820.0, ans=0.2 2024-08-11 23:43:28,333 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-11 23:43:37,857 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.832e-01 2024-08-11 23:43:40,587 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-11 23:43:53,270 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-11 23:43:58,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+01 2.669e+01 2.869e+01 3.344e+01 4.634e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-11 23:43:59,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2024-08-11 23:44:00,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1346120.0, ans=0.0 2024-08-11 23:44:21,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1346220.0, ans=0.125 2024-08-11 23:44:27,802 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4200, loss[loss=0.1232, beats_loss=0.01155, ecapa_loss=0.0002032, whisper_loss=0.1097, over 21560.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0113, ecapa_loss=0.0001856, whisper_loss=0.09251, over 3847757.62 frames. ], batch size: 87, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:44:33,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1346320.0, ans=0.125 2024-08-11 23:44:41,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1346420.0, ans=0.1 2024-08-11 23:44:44,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1346420.0, ans=0.2 2024-08-11 23:44:52,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-11 23:44:54,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-08-11 23:45:00,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-11 23:45:14,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1346620.0, ans=0.0 2024-08-11 23:45:22,736 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-11 23:45:36,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4250, loss[loss=0.1099, beats_loss=0.01153, ecapa_loss=0.0001816, whisper_loss=0.09652, over 22543.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01122, ecapa_loss=0.0001858, whisper_loss=0.09271, over 3875419.54 frames. ], batch size: 90, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:45:43,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1346820.0, ans=0.125 2024-08-11 23:45:46,711 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.392e+00 2024-08-11 23:45:50,738 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-11 23:46:10,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1347020.0, ans=0.1 2024-08-11 23:46:17,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.600e+01 2.838e+01 3.253e+01 4.399e+01, threshold=5.676e+01, percent-clipped=0.0 2024-08-11 23:46:21,763 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-11 23:46:23,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1347120.0, ans=0.2 2024-08-11 23:46:24,396 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-11 23:46:42,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1347220.0, ans=0.125 2024-08-11 23:46:46,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4300, loss[loss=0.09677, beats_loss=0.01154, ecapa_loss=0.0002093, whisper_loss=0.08314, over 17887.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01117, ecapa_loss=0.0001859, whisper_loss=0.09274, over 3891026.22 frames. ], batch size: 71, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:46:52,006 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 23:46:54,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1347320.0, ans=0.1 2024-08-11 23:46:58,999 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-11 23:47:14,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1347520.0, ans=0.0 2024-08-11 23:47:14,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1347520.0, ans=0.0 2024-08-11 23:47:20,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1347520.0, ans=0.1 2024-08-11 23:47:43,883 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-11 23:47:56,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4350, loss[loss=0.1062, beats_loss=0.01049, ecapa_loss=0.0001677, whisper_loss=0.09399, over 17823.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01115, ecapa_loss=0.0001862, whisper_loss=0.09222, over 3876970.23 frames. ], batch size: 69, lr: 6.39e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:47:59,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1347820.0, ans=0.0 2024-08-11 23:48:03,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1347820.0, ans=0.125 2024-08-11 23:48:22,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1348020.0, ans=0.0 2024-08-11 23:48:34,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1348020.0, ans=0.1 2024-08-11 23:48:36,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.570e+01 2.850e+01 3.397e+01 5.504e+01, threshold=5.701e+01, percent-clipped=0.0 2024-08-11 23:48:39,109 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 23:48:47,585 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-11 23:48:49,012 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-11 23:49:04,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1348220.0, ans=0.0 2024-08-11 23:49:07,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4400, loss[loss=0.1189, beats_loss=0.01072, ecapa_loss=0.0001991, whisper_loss=0.1061, over 22443.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01121, ecapa_loss=0.0001867, whisper_loss=0.09205, over 3886262.36 frames. ], batch size: 93, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:49:07,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1348320.0, ans=0.0 2024-08-11 23:49:10,234 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-11 23:49:10,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1348320.0, ans=0.2 2024-08-11 23:49:11,553 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-11 23:49:13,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1348320.0, ans=0.2 2024-08-11 23:49:19,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1348320.0, ans=0.125 2024-08-11 23:49:22,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1348420.0, ans=0.0 2024-08-11 23:49:23,222 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-11 23:49:27,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=12.0 2024-08-11 23:49:38,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1348520.0, ans=0.1 2024-08-11 23:49:44,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2024-08-11 23:49:49,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2024-08-11 23:49:54,197 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-11 23:50:02,714 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-11 23:50:19,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4450, loss[loss=0.1044, beats_loss=0.01208, ecapa_loss=0.0001615, whisper_loss=0.09074, over 22521.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01122, ecapa_loss=0.0001845, whisper_loss=0.09233, over 3886777.75 frames. ], batch size: 88, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:50:22,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2024-08-11 23:50:23,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=12.0 2024-08-11 23:50:27,076 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-11 23:50:51,432 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-11 23:50:52,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1349020.0, ans=0.125 2024-08-11 23:50:57,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1349020.0, ans=0.125 2024-08-11 23:51:00,434 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.285e+02 2024-08-11 23:51:01,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.191e+01 2.643e+01 3.000e+01 3.439e+01 5.029e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-11 23:51:03,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=12.0 2024-08-11 23:51:18,799 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-11 23:51:23,299 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-11 23:51:29,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4500, loss[loss=0.1089, beats_loss=0.01, ecapa_loss=0.0002304, whisper_loss=0.09662, over 16094.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01128, ecapa_loss=0.0001853, whisper_loss=0.09228, over 3940695.23 frames. ], batch size: 64, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:51:30,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-08-11 23:51:33,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1349320.0, ans=0.0 2024-08-11 23:51:39,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1349320.0, ans=0.125 2024-08-11 23:51:41,349 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-11 23:51:48,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2024-08-11 23:51:55,181 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-11 23:52:15,677 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-11 23:52:28,195 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-11 23:52:33,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1349720.0, ans=0.125 2024-08-11 23:52:34,747 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-11 23:52:38,980 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4550, loss[loss=0.109, beats_loss=0.009691, ecapa_loss=0.0001816, whisper_loss=0.09748, over 21185.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01125, ecapa_loss=0.0001837, whisper_loss=0.09241, over 3929195.36 frames. ], batch size: 82, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:52:44,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1349820.0, ans=0.125 2024-08-11 23:52:51,512 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:53:08,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-11 23:53:19,059 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.529e+01 2.869e+01 3.379e+01 6.425e+01, threshold=5.739e+01, percent-clipped=1.0 2024-08-11 23:53:31,105 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-11 23:53:48,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4600, loss[loss=0.08221, beats_loss=0.01235, ecapa_loss=0.0001754, whisper_loss=0.06811, over 16550.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01125, ecapa_loss=0.0001856, whisper_loss=0.09189, over 3891904.95 frames. ], batch size: 68, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:54:02,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1350420.0, ans=0.0 2024-08-11 23:54:09,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1350420.0, ans=0.125 2024-08-11 23:54:14,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1350420.0, ans=0.0 2024-08-11 23:54:21,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1350520.0, ans=0.0 2024-08-11 23:54:28,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1350520.0, ans=0.125 2024-08-11 23:54:37,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1350620.0, ans=0.0 2024-08-11 23:54:40,655 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-11 23:54:49,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1350720.0, ans=0.125 2024-08-11 23:55:00,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4650, loss[loss=0.1028, beats_loss=0.01139, ecapa_loss=0.0001949, whisper_loss=0.08947, over 21592.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01122, ecapa_loss=0.0001866, whisper_loss=0.09212, over 3884033.22 frames. ], batch size: 88, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:55:06,928 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-11 23:55:09,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2024-08-11 23:55:14,481 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-11 23:55:16,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1350920.0, ans=0.09899494936611666 2024-08-11 23:55:29,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1351020.0, ans=0.1 2024-08-11 23:55:38,951 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 16 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-11 23:55:43,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.678e+01 3.059e+01 3.452e+01 5.229e+01, threshold=6.118e+01, percent-clipped=0.0 2024-08-11 23:55:50,003 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-11 23:56:01,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1351220.0, ans=0.125 2024-08-11 23:56:13,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4700, loss[loss=0.0972, beats_loss=0.01313, ecapa_loss=0.000189, whisper_loss=0.08217, over 20468.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01122, ecapa_loss=0.0001858, whisper_loss=0.09227, over 3897602.11 frames. ], batch size: 82, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:56:25,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.98 vs. limit=22.5 2024-08-11 23:56:53,972 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-11 23:56:55,323 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-11 23:57:07,121 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-11 23:57:09,589 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-11 23:57:21,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1351720.0, ans=0.125 2024-08-11 23:57:27,179 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4750, loss[loss=0.09967, beats_loss=0.01414, ecapa_loss=0.0001815, whisper_loss=0.08372, over 21857.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0112, ecapa_loss=0.0001868, whisper_loss=0.09217, over 3885170.93 frames. ], batch size: 92, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:57:27,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1351820.0, ans=0.1 2024-08-11 23:57:31,566 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 18 from LS+wenet, 31 from Vox, 45 fro AS 2024-08-11 23:57:32,988 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-11 23:57:37,858 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-11 23:57:43,550 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-11 23:57:56,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1352020.0, ans=0.1 2024-08-11 23:57:56,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.08 vs. limit=10.0 2024-08-11 23:58:09,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.685e+01 3.044e+01 3.744e+01 5.202e+01, threshold=6.087e+01, percent-clipped=0.0 2024-08-11 23:58:12,849 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 16 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-11 23:58:17,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1352120.0, ans=0.125 2024-08-11 23:58:17,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1352120.0, ans=0.125 2024-08-11 23:58:20,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1352120.0, ans=0.1 2024-08-11 23:58:41,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4800, loss[loss=0.1065, beats_loss=0.01322, ecapa_loss=0.0001591, whisper_loss=0.09174, over 16073.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01126, ecapa_loss=0.0001861, whisper_loss=0.09174, over 3897721.36 frames. ], batch size: 63, lr: 6.38e-03, grad_scale: 5.764607523034235e+17 2024-08-11 23:58:58,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1352420.0, ans=0.07 2024-08-11 23:59:04,304 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-11 23:59:08,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1352420.0, ans=10.0 2024-08-11 23:59:30,653 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-11 23:59:36,453 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-11 23:59:48,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1352720.0, ans=0.0 2024-08-11 23:59:50,779 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.629e-03 2024-08-11 23:59:57,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4850, loss[loss=0.1123, beats_loss=0.01049, ecapa_loss=0.0001578, whisper_loss=0.1002, over 17062.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01124, ecapa_loss=0.0001865, whisper_loss=0.09167, over 3900226.41 frames. ], batch size: 64, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:00:00,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1352820.0, ans=0.1 2024-08-12 00:00:01,633 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 00:00:10,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1352920.0, ans=0.125 2024-08-12 00:00:30,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2024-08-12 00:00:39,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.218e+01 2.736e+01 3.106e+01 3.475e+01 1.081e+02, threshold=6.213e+01, percent-clipped=2.0 2024-08-12 00:00:44,204 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:00:56,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1353220.0, ans=0.125 2024-08-12 00:00:58,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-12 00:00:59,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1353220.0, ans=0.1 2024-08-12 00:01:05,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1353220.0, ans=0.125 2024-08-12 00:01:10,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4900, loss[loss=0.1023, beats_loss=0.01187, ecapa_loss=0.0001867, whisper_loss=0.08859, over 19331.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01125, ecapa_loss=0.0001866, whisper_loss=0.09211, over 3900054.53 frames. ], batch size: 76, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:01:15,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2024-08-12 00:01:23,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1353420.0, ans=0.125 2024-08-12 00:01:31,094 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 00:01:33,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-08-12 00:01:37,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=12.0 2024-08-12 00:01:50,447 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 00:01:55,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1353620.0, ans=0.1 2024-08-12 00:02:09,710 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-12 00:02:24,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 4950, loss[loss=0.1059, beats_loss=0.01006, ecapa_loss=0.0001892, whisper_loss=0.094, over 22899.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01119, ecapa_loss=0.000188, whisper_loss=0.09211, over 3879799.32 frames. ], batch size: 87, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:02:26,324 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 00:02:41,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1353920.0, ans=0.2 2024-08-12 00:02:47,390 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 00:02:54,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-08-12 00:02:55,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1354020.0, ans=0.125 2024-08-12 00:02:55,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1354020.0, ans=22.5 2024-08-12 00:03:03,570 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 00:03:07,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.588e+01 2.867e+01 3.231e+01 4.752e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 00:03:08,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1354120.0, ans=0.125 2024-08-12 00:03:09,461 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 00:03:10,728 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 00:03:25,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1354220.0, ans=0.125 2024-08-12 00:03:36,353 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 00:03:37,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5000, loss[loss=0.09865, beats_loss=0.01271, ecapa_loss=0.0001882, whisper_loss=0.08405, over 22743.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01113, ecapa_loss=0.0001889, whisper_loss=0.0929, over 3882857.19 frames. ], batch size: 95, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:03:57,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2024-08-12 00:04:05,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1354520.0, ans=0.125 2024-08-12 00:04:09,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1354520.0, ans=0.02 2024-08-12 00:04:09,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1354520.0, ans=0.0 2024-08-12 00:04:11,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.03 vs. limit=15.0 2024-08-12 00:04:25,975 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 00:04:31,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1354620.0, ans=0.0 2024-08-12 00:04:48,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5050, loss[loss=0.09707, beats_loss=0.01184, ecapa_loss=0.0001829, whisper_loss=0.0834, over 17329.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01124, ecapa_loss=0.0001866, whisper_loss=0.09263, over 3908357.79 frames. ], batch size: 70, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:04:52,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1354820.0, ans=0.1 2024-08-12 00:04:54,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=22.5 2024-08-12 00:04:56,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1354820.0, ans=0.125 2024-08-12 00:05:21,373 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-12 00:05:29,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.677e+01 3.041e+01 3.640e+01 6.697e+01, threshold=6.081e+01, percent-clipped=3.0 2024-08-12 00:05:42,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1355120.0, ans=0.2 2024-08-12 00:05:53,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1355220.0, ans=0.125 2024-08-12 00:06:00,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5100, loss[loss=0.09731, beats_loss=0.01238, ecapa_loss=0.0002576, whisper_loss=0.08236, over 15681.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01121, ecapa_loss=0.0001868, whisper_loss=0.09352, over 3902966.15 frames. ], batch size: 66, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:06:05,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2024-08-12 00:06:21,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1355420.0, ans=0.125 2024-08-12 00:06:39,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1355520.0, ans=0.025 2024-08-12 00:06:45,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1355620.0, ans=0.2 2024-08-12 00:06:47,928 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 00:06:50,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1355620.0, ans=0.0 2024-08-12 00:06:59,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1355720.0, ans=0.0 2024-08-12 00:07:09,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5150, loss[loss=0.1067, beats_loss=0.0105, ecapa_loss=0.0001912, whisper_loss=0.09428, over 22197.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01124, ecapa_loss=0.0001869, whisper_loss=0.09328, over 3936918.38 frames. ], batch size: 90, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:07:21,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2024-08-12 00:07:24,399 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 00:07:27,034 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 00:07:29,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.04 vs. limit=22.5 2024-08-12 00:07:31,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1355920.0, ans=0.0 2024-08-12 00:07:31,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1355920.0, ans=0.125 2024-08-12 00:07:33,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=1355920.0, ans=0.2 2024-08-12 00:07:34,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1355920.0, ans=0.125 2024-08-12 00:07:34,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1355920.0, ans=0.125 2024-08-12 00:07:47,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-08-12 00:07:50,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.961e+01 3.572e+01 5.621e+01, threshold=5.922e+01, percent-clipped=0.0 2024-08-12 00:07:58,246 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 15 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 00:08:01,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1356120.0, ans=0.0 2024-08-12 00:08:20,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5200, loss[loss=0.09472, beats_loss=0.0133, ecapa_loss=0.0001649, whisper_loss=0.07977, over 21922.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.0112, ecapa_loss=0.0001867, whisper_loss=0.09306, over 3912273.88 frames. ], batch size: 91, lr: 6.37e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:08:24,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1356320.0, ans=0.125 2024-08-12 00:08:25,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1356320.0, ans=0.1 2024-08-12 00:08:26,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1356320.0, ans=0.125 2024-08-12 00:08:28,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1356320.0, ans=0.125 2024-08-12 00:08:29,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1356320.0, ans=0.125 2024-08-12 00:08:31,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1356320.0, ans=0.125 2024-08-12 00:08:33,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1356420.0, ans=0.1 2024-08-12 00:08:33,845 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:08:36,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1356420.0, ans=0.0 2024-08-12 00:08:47,566 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 00:08:48,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-08-12 00:08:50,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=12.0 2024-08-12 00:08:53,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1356520.0, ans=0.2 2024-08-12 00:09:23,809 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 00:09:27,848 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 00:09:30,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5250, loss[loss=0.09704, beats_loss=0.0118, ecapa_loss=0.0001675, whisper_loss=0.08356, over 17612.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01122, ecapa_loss=0.0001862, whisper_loss=0.09227, over 3881307.37 frames. ], batch size: 68, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:09:42,353 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 00:09:43,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1356920.0, ans=0.1 2024-08-12 00:09:52,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1356920.0, ans=0.1 2024-08-12 00:09:53,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1356920.0, ans=0.125 2024-08-12 00:09:54,956 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.130e-01 2024-08-12 00:10:07,648 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 00:10:11,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.537e+01 2.858e+01 3.258e+01 4.916e+01, threshold=5.717e+01, percent-clipped=0.0 2024-08-12 00:10:30,755 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 00:10:40,575 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5300, loss[loss=0.1257, beats_loss=0.009279, ecapa_loss=0.0001989, whisper_loss=0.1144, over 16808.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01114, ecapa_loss=0.0001871, whisper_loss=0.09275, over 3871175.35 frames. ], batch size: 67, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:11:03,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1357420.0, ans=0.125 2024-08-12 00:11:18,753 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 00:11:19,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1357520.0, ans=0.0 2024-08-12 00:11:28,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1357620.0, ans=0.125 2024-08-12 00:11:49,246 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5350, loss[loss=0.1101, beats_loss=0.01066, ecapa_loss=0.0001732, whisper_loss=0.09769, over 14308.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01109, ecapa_loss=0.0001865, whisper_loss=0.09244, over 3855971.31 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:11:49,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1357820.0, ans=0.0 2024-08-12 00:11:51,054 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 00:11:56,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1357820.0, ans=0.0 2024-08-12 00:12:00,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1357820.0, ans=0.025 2024-08-12 00:12:03,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1357920.0, ans=0.0 2024-08-12 00:12:03,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=12.0 2024-08-12 00:12:04,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1357920.0, ans=0.125 2024-08-12 00:12:14,119 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 00:12:16,783 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 11 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 00:12:18,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1358020.0, ans=0.0 2024-08-12 00:12:30,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.572e+01 2.816e+01 3.245e+01 5.813e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 00:13:01,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5400, loss[loss=0.1023, beats_loss=0.00981, ecapa_loss=0.0001642, whisper_loss=0.09085, over 18494.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.011, ecapa_loss=0.0001858, whisper_loss=0.09275, over 3871791.54 frames. ], batch size: 67, lr: 6.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:13:11,272 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 00:13:13,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2024-08-12 00:13:33,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1358520.0, ans=0.0 2024-08-12 00:14:16,203 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 00:14:16,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1358720.0, ans=0.2 2024-08-12 00:14:18,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5450, loss[loss=0.07721, beats_loss=0.0107, ecapa_loss=0.0001563, whisper_loss=0.06495, over 15485.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01106, ecapa_loss=0.0001875, whisper_loss=0.09224, over 3852431.36 frames. ], batch size: 61, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:14:23,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1358820.0, ans=0.125 2024-08-12 00:14:24,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1358820.0, ans=0.125 2024-08-12 00:14:27,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-12 00:14:39,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1358920.0, ans=15.0 2024-08-12 00:15:02,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1359020.0, ans=0.0 2024-08-12 00:15:02,642 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:15:05,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.617e+01 2.957e+01 3.359e+01 7.305e+01, threshold=5.914e+01, percent-clipped=2.0 2024-08-12 00:15:09,358 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 00:15:18,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=12.0 2024-08-12 00:15:46,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5500, loss[loss=0.1133, beats_loss=0.01254, ecapa_loss=0.0001977, whisper_loss=0.09874, over 22060.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01111, ecapa_loss=0.0001884, whisper_loss=0.09221, over 3871835.80 frames. ], batch size: 93, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:16:30,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1359520.0, ans=0.125 2024-08-12 00:16:50,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1359620.0, ans=0.025 2024-08-12 00:17:18,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1359820.0, ans=0.125 2024-08-12 00:17:19,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5550, loss[loss=0.1079, beats_loss=0.009206, ecapa_loss=0.0001793, whisper_loss=0.0969, over 19233.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01114, ecapa_loss=0.0001866, whisper_loss=0.09202, over 3890848.40 frames. ], batch size: 75, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:17:20,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1359820.0, ans=0.5 2024-08-12 00:17:22,602 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 00:17:24,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1359820.0, ans=0.0 2024-08-12 00:17:29,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1359820.0, ans=0.125 2024-08-12 00:17:51,354 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-136000.pt 2024-08-12 00:18:14,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.662e+01 3.000e+01 3.511e+01 5.450e+01, threshold=6.001e+01, percent-clipped=0.0 2024-08-12 00:18:23,173 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:18:26,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1360120.0, ans=0.2 2024-08-12 00:18:41,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2024-08-12 00:18:44,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1360220.0, ans=0.0 2024-08-12 00:18:49,509 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 00:18:49,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1360220.0, ans=0.05 2024-08-12 00:18:53,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5600, loss[loss=0.1013, beats_loss=0.01222, ecapa_loss=0.0002293, whisper_loss=0.08677, over 21808.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001872, whisper_loss=0.09207, over 3869305.30 frames. ], batch size: 93, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:19:09,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1360320.0, ans=0.1 2024-08-12 00:19:15,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1360420.0, ans=0.0 2024-08-12 00:19:18,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1360420.0, ans=0.0 2024-08-12 00:19:20,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1360420.0, ans=0.0 2024-08-12 00:19:27,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=12.0 2024-08-12 00:19:32,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1360520.0, ans=0.125 2024-08-12 00:19:41,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1360520.0, ans=0.0 2024-08-12 00:19:41,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1360520.0, ans=0.1 2024-08-12 00:19:46,456 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 00:19:59,445 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-12 00:19:59,770 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:20:19,136 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 00:20:24,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5650, loss[loss=0.09607, beats_loss=0.01077, ecapa_loss=0.0002162, whisper_loss=0.08314, over 15027.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0112, ecapa_loss=0.0001869, whisper_loss=0.0912, over 3889678.11 frames. ], batch size: 63, lr: 6.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:20:30,300 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 00:20:41,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-12 00:20:52,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1361020.0, ans=0.1 2024-08-12 00:20:56,180 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 00:21:04,056 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+01 2.708e+01 3.179e+01 3.775e+01 1.197e+02, threshold=6.358e+01, percent-clipped=2.0 2024-08-12 00:21:04,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1361120.0, ans=0.2 2024-08-12 00:21:06,856 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 00:21:26,345 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.108e-01 2024-08-12 00:21:32,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5700, loss[loss=0.1002, beats_loss=0.012, ecapa_loss=0.0001975, whisper_loss=0.08622, over 21882.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01116, ecapa_loss=0.000187, whisper_loss=0.09113, over 3896967.42 frames. ], batch size: 89, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:21:38,594 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 00:22:14,895 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 00:22:19,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1361620.0, ans=0.0 2024-08-12 00:22:26,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1361720.0, ans=0.125 2024-08-12 00:22:28,451 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 00:22:33,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1361720.0, ans=0.125 2024-08-12 00:22:40,219 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5750, loss[loss=0.1013, beats_loss=0.01279, ecapa_loss=0.0001393, whisper_loss=0.08716, over 16188.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01123, ecapa_loss=0.000186, whisper_loss=0.0908, over 3882885.94 frames. ], batch size: 63, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:22:48,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1361820.0, ans=0.0 2024-08-12 00:23:15,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1362020.0, ans=0.0 2024-08-12 00:23:15,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1362020.0, ans=0.125 2024-08-12 00:23:20,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.574e+01 2.789e+01 3.089e+01 4.490e+01, threshold=5.577e+01, percent-clipped=0.0 2024-08-12 00:23:29,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1362120.0, ans=0.125 2024-08-12 00:23:32,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-12 00:23:49,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5800, loss[loss=0.09611, beats_loss=0.01089, ecapa_loss=0.0002328, whisper_loss=0.08289, over 20913.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01118, ecapa_loss=0.000188, whisper_loss=0.09107, over 3855854.57 frames. ], batch size: 88, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:24:03,316 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 00:24:14,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1362420.0, ans=0.125 2024-08-12 00:24:31,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1362620.0, ans=0.1 2024-08-12 00:24:32,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1362620.0, ans=0.5 2024-08-12 00:24:44,469 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 00:24:58,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5850, loss[loss=0.1329, beats_loss=0.007006, ecapa_loss=0.0002051, whisper_loss=0.1238, over 18842.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0112, ecapa_loss=0.0001882, whisper_loss=0.09102, over 3860879.69 frames. ], batch size: 74, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:24:59,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1362820.0, ans=0.0 2024-08-12 00:25:06,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1362820.0, ans=0.2 2024-08-12 00:25:09,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1362820.0, ans=0.0 2024-08-12 00:25:17,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1362920.0, ans=0.015 2024-08-12 00:25:20,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2024-08-12 00:25:37,564 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.515e+01 2.804e+01 3.095e+01 4.578e+01, threshold=5.608e+01, percent-clipped=0.0 2024-08-12 00:25:42,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1363120.0, ans=0.1 2024-08-12 00:25:57,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1363220.0, ans=10.0 2024-08-12 00:26:06,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5900, loss[loss=0.1074, beats_loss=0.01251, ecapa_loss=0.0001828, whisper_loss=0.0931, over 20897.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01135, ecapa_loss=0.0001861, whisper_loss=0.09012, over 3850081.73 frames. ], batch size: 87, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:26:12,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1363320.0, ans=0.125 2024-08-12 00:26:13,602 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-12 00:26:22,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1363420.0, ans=0.5 2024-08-12 00:27:10,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1363720.0, ans=0.125 2024-08-12 00:27:12,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1363720.0, ans=0.125 2024-08-12 00:27:14,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 5950, loss[loss=0.09632, beats_loss=0.01253, ecapa_loss=0.0002067, whisper_loss=0.08171, over 22189.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01132, ecapa_loss=0.0001866, whisper_loss=0.09054, over 3868156.89 frames. ], batch size: 92, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:27:22,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1363820.0, ans=0.125 2024-08-12 00:27:28,442 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 00:27:28,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1363920.0, ans=0.2 2024-08-12 00:27:49,514 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 00:27:52,500 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.295e+02 2024-08-12 00:27:53,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+01 2.619e+01 2.853e+01 3.292e+01 6.548e+01, threshold=5.706e+01, percent-clipped=1.0 2024-08-12 00:27:53,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1364120.0, ans=0.125 2024-08-12 00:27:55,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-12 00:27:56,543 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-12 00:28:18,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1364220.0, ans=0.0 2024-08-12 00:28:22,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6000, loss[loss=0.1057, beats_loss=0.01099, ecapa_loss=0.0001973, whisper_loss=0.09274, over 21981.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001869, whisper_loss=0.09199, over 3861741.60 frames. ], batch size: 91, lr: 6.35e-03, grad_scale: 1.152921504606847e+18 2024-08-12 00:28:22,350 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 00:28:34,905 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3551, 3.4564, 3.6323, 2.5329, 0.8893, 4.3962, 3.9528, 0.9235], device='cuda:0') 2024-08-12 00:29:04,101 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on ASR_libri: loss=0.2569, beats_loss=0, ecapa_loss=0.0006172, whisper_loss=0.2508, over 922467.00 frames. 2024-08-12 00:29:22,666 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on SV_voxceleb1: loss=0.005036, beats_loss=0, ecapa_loss=0.0005036, whisper_loss=0, over 939242.00 frames. 2024-08-12 00:31:25,891 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 00:31:25,895 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 00:31:31,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1364320.0, ans=0.1 2024-08-12 00:31:37,365 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 00:31:54,301 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 00:31:57,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1364520.0, ans=0.125 2024-08-12 00:31:59,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1364520.0, ans=0.0 2024-08-12 00:32:11,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1364620.0, ans=0.0 2024-08-12 00:32:12,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1364620.0, ans=0.125 2024-08-12 00:32:21,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1364720.0, ans=0.125 2024-08-12 00:32:22,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1364720.0, ans=0.125 2024-08-12 00:32:33,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1364820.0, ans=0.0 2024-08-12 00:32:34,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6050, loss[loss=0.1146, beats_loss=0.01353, ecapa_loss=0.0001682, whisper_loss=0.09939, over 22449.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01115, ecapa_loss=0.000186, whisper_loss=0.09201, over 3864009.85 frames. ], batch size: 90, lr: 6.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:32:41,785 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 00:32:56,737 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 00:33:06,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1365020.0, ans=0.05 2024-08-12 00:33:12,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1365020.0, ans=0.2 2024-08-12 00:33:13,577 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 00:33:16,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.639e+01 2.972e+01 3.364e+01 6.267e+01, threshold=5.943e+01, percent-clipped=1.0 2024-08-12 00:33:16,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1365120.0, ans=0.125 2024-08-12 00:33:25,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1365120.0, ans=0.1 2024-08-12 00:33:26,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1365120.0, ans=0.125 2024-08-12 00:33:31,736 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 00:33:44,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6100, loss[loss=0.08908, beats_loss=0.01222, ecapa_loss=0.0001956, whisper_loss=0.0749, over 19624.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01116, ecapa_loss=0.0001846, whisper_loss=0.09254, over 3867641.81 frames. ], batch size: 83, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:33:58,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1365420.0, ans=0.0 2024-08-12 00:34:45,313 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 00:34:54,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6150, loss[loss=0.08903, beats_loss=0.01126, ecapa_loss=0.0002318, whisper_loss=0.07546, over 15916.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01123, ecapa_loss=0.0001853, whisper_loss=0.09209, over 3857477.24 frames. ], batch size: 67, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:35:00,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2024-08-12 00:35:06,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2024-08-12 00:35:27,634 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 00:35:27,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1366020.0, ans=0.025 2024-08-12 00:35:28,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-12 00:35:36,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.497e+01 2.771e+01 3.038e+01 4.710e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-12 00:36:03,263 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6200, loss[loss=0.1082, beats_loss=0.01244, ecapa_loss=0.0001538, whisper_loss=0.09427, over 19532.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0113, ecapa_loss=0.0001834, whisper_loss=0.09173, over 3848770.02 frames. ], batch size: 76, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:36:13,365 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 00:36:24,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=12.0 2024-08-12 00:36:31,267 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-12 00:36:57,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1366720.0, ans=0.125 2024-08-12 00:36:59,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1366720.0, ans=10.0 2024-08-12 00:37:10,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1366720.0, ans=0.0 2024-08-12 00:37:12,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6250, loss[loss=0.09088, beats_loss=0.01172, ecapa_loss=0.0001864, whisper_loss=0.0773, over 16225.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01123, ecapa_loss=0.0001844, whisper_loss=0.09126, over 3835625.40 frames. ], batch size: 61, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:37:19,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1366820.0, ans=0.125 2024-08-12 00:37:23,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2024-08-12 00:37:23,964 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 00:37:25,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1366920.0, ans=0.0 2024-08-12 00:37:35,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1366920.0, ans=0.0 2024-08-12 00:37:38,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1366920.0, ans=0.1 2024-08-12 00:37:44,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1367020.0, ans=0.0 2024-08-12 00:37:47,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-12 00:37:53,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.633e+01 2.869e+01 3.281e+01 7.272e+01, threshold=5.739e+01, percent-clipped=3.0 2024-08-12 00:37:54,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1367120.0, ans=0.0 2024-08-12 00:37:56,918 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 00:38:07,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1367220.0, ans=0.125 2024-08-12 00:38:13,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=12.0 2024-08-12 00:38:21,929 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6300, loss[loss=0.1223, beats_loss=0.01138, ecapa_loss=0.0002021, whisper_loss=0.109, over 21041.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01125, ecapa_loss=0.000184, whisper_loss=0.09182, over 3814976.05 frames. ], batch size: 89, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:38:23,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-08-12 00:38:24,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1367320.0, ans=0.125 2024-08-12 00:38:25,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2024-08-12 00:38:46,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.13 vs. limit=22.5 2024-08-12 00:38:49,136 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 00:38:53,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1367520.0, ans=0.1 2024-08-12 00:38:58,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1367520.0, ans=0.0 2024-08-12 00:39:11,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1367620.0, ans=0.2 2024-08-12 00:39:30,838 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6350, loss[loss=0.08991, beats_loss=0.0125, ecapa_loss=0.0001693, whisper_loss=0.07572, over 15237.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01117, ecapa_loss=0.0001845, whisper_loss=0.09258, over 3841799.65 frames. ], batch size: 62, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:39:31,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1367820.0, ans=0.125 2024-08-12 00:39:33,921 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 00:39:44,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.51 vs. limit=12.0 2024-08-12 00:39:56,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1367920.0, ans=0.1 2024-08-12 00:40:05,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-08-12 00:40:12,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.594e+01 2.991e+01 3.551e+01 3.558e+02, threshold=5.982e+01, percent-clipped=1.0 2024-08-12 00:40:28,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.21 vs. limit=10.0 2024-08-12 00:40:40,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6400, loss[loss=0.1154, beats_loss=0.01392, ecapa_loss=0.0001913, whisper_loss=0.09956, over 22116.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01119, ecapa_loss=0.0001846, whisper_loss=0.09236, over 3833926.36 frames. ], batch size: 91, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:40:45,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-12 00:40:45,952 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 00:40:58,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1368420.0, ans=0.0 2024-08-12 00:41:09,874 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 00:41:10,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1368520.0, ans=0.1 2024-08-12 00:41:20,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1368620.0, ans=0.0 2024-08-12 00:41:47,988 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-12 00:41:49,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6450, loss[loss=0.1014, beats_loss=0.01168, ecapa_loss=0.0001129, whisper_loss=0.08856, over 15095.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0112, ecapa_loss=0.0001845, whisper_loss=0.09254, over 3871509.73 frames. ], batch size: 55, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:42:03,960 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 00:42:10,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-12 00:42:15,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1369020.0, ans=0.125 2024-08-12 00:42:27,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1369020.0, ans=0.07 2024-08-12 00:42:30,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.638e+01 2.996e+01 3.413e+01 4.809e+01, threshold=5.992e+01, percent-clipped=1.0 2024-08-12 00:42:55,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1369220.0, ans=0.1 2024-08-12 00:42:58,165 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6500, loss[loss=0.1201, beats_loss=0.01013, ecapa_loss=0.0001918, whisper_loss=0.1081, over 23278.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01117, ecapa_loss=0.0001851, whisper_loss=0.09313, over 3880126.98 frames. ], batch size: 92, lr: 6.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:42:59,769 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 00:43:01,125 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 00:43:06,660 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 00:43:27,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1369520.0, ans=0.125 2024-08-12 00:43:59,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-12 00:44:07,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6550, loss[loss=0.1137, beats_loss=0.01204, ecapa_loss=0.0002249, whisper_loss=0.09944, over 19246.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01116, ecapa_loss=0.0001854, whisper_loss=0.09319, over 3880145.53 frames. ], batch size: 79, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:44:21,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1369920.0, ans=0.2 2024-08-12 00:44:26,500 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 00:44:40,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1370020.0, ans=0.0 2024-08-12 00:44:43,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1370020.0, ans=0.125 2024-08-12 00:44:47,368 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 00:44:48,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.184e+01 2.662e+01 3.000e+01 3.439e+01 5.833e+01, threshold=5.999e+01, percent-clipped=0.0 2024-08-12 00:45:02,382 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 00:45:16,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6600, loss[loss=0.1019, beats_loss=0.01367, ecapa_loss=0.0001553, whisper_loss=0.08667, over 22902.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01112, ecapa_loss=0.0001862, whisper_loss=0.09363, over 3917755.08 frames. ], batch size: 92, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:45:38,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1370420.0, ans=0.0 2024-08-12 00:45:42,139 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 00:45:49,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1370520.0, ans=0.2 2024-08-12 00:45:52,279 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 00:46:10,256 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 00:46:25,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6650, loss[loss=0.0756, beats_loss=0.01514, ecapa_loss=0.0002001, whisper_loss=0.05845, over 16684.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01114, ecapa_loss=0.0001872, whisper_loss=0.09343, over 3940516.50 frames. ], batch size: 74, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:46:27,990 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 34 from Vox, 23 fro AS 2024-08-12 00:46:35,710 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:46:48,928 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 00:47:05,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1371120.0, ans=0.1 2024-08-12 00:47:05,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1371120.0, ans=0.0 2024-08-12 00:47:06,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.593e+01 2.812e+01 3.124e+01 4.169e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 00:47:09,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1371120.0, ans=0.1 2024-08-12 00:47:18,138 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 00:47:24,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-12 00:47:34,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6700, loss[loss=0.1168, beats_loss=0.0111, ecapa_loss=0.0001773, whisper_loss=0.1039, over 22798.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01112, ecapa_loss=0.0001868, whisper_loss=0.09324, over 3928347.93 frames. ], batch size: 93, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:47:37,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1371320.0, ans=0.125 2024-08-12 00:47:40,650 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 00:47:48,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1371420.0, ans=0.0 2024-08-12 00:47:56,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2024-08-12 00:48:02,715 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 00:48:13,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-12 00:48:18,164 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 00:48:33,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-08-12 00:48:35,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1371720.0, ans=0.125 2024-08-12 00:48:44,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6750, loss[loss=0.1268, beats_loss=0.01045, ecapa_loss=0.0001424, whisper_loss=0.1149, over 21852.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01116, ecapa_loss=0.0001867, whisper_loss=0.09368, over 3922089.85 frames. ], batch size: 83, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:49:17,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.73 vs. limit=10.0 2024-08-12 00:49:26,537 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.541e+01 2.925e+01 3.464e+01 4.634e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 00:49:30,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1372120.0, ans=0.2 2024-08-12 00:49:45,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1372220.0, ans=0.0 2024-08-12 00:49:52,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1372220.0, ans=0.015 2024-08-12 00:49:53,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.01 vs. limit=6.0 2024-08-12 00:49:54,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6800, loss[loss=0.07997, beats_loss=0.01311, ecapa_loss=0.0001578, whisper_loss=0.06528, over 22100.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01113, ecapa_loss=0.0001878, whisper_loss=0.09342, over 3923135.39 frames. ], batch size: 89, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:49:54,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1372320.0, ans=0.0 2024-08-12 00:49:58,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1372320.0, ans=0.1 2024-08-12 00:50:07,116 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 00:50:09,827 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 00:50:22,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1372520.0, ans=0.0 2024-08-12 00:50:25,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-08-12 00:50:29,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1372520.0, ans=0.07 2024-08-12 00:50:40,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1372620.0, ans=0.125 2024-08-12 00:50:41,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1372620.0, ans=0.125 2024-08-12 00:50:47,013 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 00:50:47,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1372620.0, ans=0.125 2024-08-12 00:50:48,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1372720.0, ans=0.125 2024-08-12 00:50:54,265 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 00:50:55,619 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 00:50:56,945 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 00:50:58,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1372720.0, ans=0.05 2024-08-12 00:51:03,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6850, loss[loss=0.1225, beats_loss=0.01013, ecapa_loss=0.0001959, whisper_loss=0.1104, over 21403.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0001868, whisper_loss=0.09305, over 3901622.87 frames. ], batch size: 89, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:51:37,914 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 00:51:39,182 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 00:51:44,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.602e+01 2.969e+01 3.307e+01 6.186e+01, threshold=5.938e+01, percent-clipped=1.0 2024-08-12 00:52:12,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6900, loss[loss=0.1051, beats_loss=0.01026, ecapa_loss=0.0001877, whisper_loss=0.09298, over 23018.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01108, ecapa_loss=0.0001865, whisper_loss=0.09386, over 3928156.77 frames. ], batch size: 94, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:52:38,737 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 00:52:44,172 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 00:53:11,192 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 00:53:23,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 6950, loss[loss=0.1029, beats_loss=0.01135, ecapa_loss=0.0001535, whisper_loss=0.08997, over 16397.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01116, ecapa_loss=0.0001856, whisper_loss=0.09308, over 3907823.64 frames. ], batch size: 63, lr: 6.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:53:26,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1373820.0, ans=0.125 2024-08-12 00:53:28,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1373820.0, ans=0.2 2024-08-12 00:53:33,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1373820.0, ans=0.2 2024-08-12 00:53:47,994 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 00:53:52,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=12.0 2024-08-12 00:53:53,182 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 00:54:03,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1374020.0, ans=0.125 2024-08-12 00:54:05,815 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.522e+01 2.749e+01 3.045e+01 4.953e+01, threshold=5.497e+01, percent-clipped=0.0 2024-08-12 00:54:06,094 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 00:54:09,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1374120.0, ans=0.125 2024-08-12 00:54:11,835 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 00:54:13,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1374120.0, ans=0.125 2024-08-12 00:54:19,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1374220.0, ans=0.0 2024-08-12 00:54:20,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1374220.0, ans=0.125 2024-08-12 00:54:23,169 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 00:54:27,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1374220.0, ans=0.07 2024-08-12 00:54:33,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7000, loss[loss=0.1229, beats_loss=0.01142, ecapa_loss=0.0001826, whisper_loss=0.1096, over 22862.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01119, ecapa_loss=0.0001849, whisper_loss=0.09338, over 3901133.95 frames. ], batch size: 90, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:54:41,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1374320.0, ans=0.125 2024-08-12 00:55:15,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1374620.0, ans=0.0 2024-08-12 00:55:33,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1374720.0, ans=0.07 2024-08-12 00:55:41,945 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7050, loss[loss=0.08693, beats_loss=0.01208, ecapa_loss=0.0002479, whisper_loss=0.07237, over 12821.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01114, ecapa_loss=0.0001863, whisper_loss=0.09312, over 3865514.49 frames. ], batch size: 58, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:55:42,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1374820.0, ans=0.2 2024-08-12 00:55:43,411 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 00:55:50,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1374820.0, ans=0.125 2024-08-12 00:55:51,797 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-12 00:56:00,888 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 00:56:18,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1375020.0, ans=0.1 2024-08-12 00:56:23,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.564e+01 2.939e+01 3.594e+01 1.844e+02, threshold=5.878e+01, percent-clipped=7.0 2024-08-12 00:56:31,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1375120.0, ans=0.125 2024-08-12 00:56:36,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1375220.0, ans=0.125 2024-08-12 00:56:38,554 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 18 from Vox, 56 fro AS 2024-08-12 00:56:41,105 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 00:56:47,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1375220.0, ans=0.0 2024-08-12 00:56:50,738 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7100, loss[loss=0.08875, beats_loss=0.01511, ecapa_loss=0.0001665, whisper_loss=0.07198, over 17616.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001867, whisper_loss=0.09257, over 3860077.94 frames. ], batch size: 71, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:57:15,484 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 00:57:59,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7150, loss[loss=0.1006, beats_loss=0.01151, ecapa_loss=0.0001656, whisper_loss=0.08745, over 21187.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0111, ecapa_loss=0.0001868, whisper_loss=0.09243, over 3847948.13 frames. ], batch size: 84, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:58:04,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2024-08-12 00:58:07,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1375820.0, ans=0.125 2024-08-12 00:58:38,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1376020.0, ans=0.125 2024-08-12 00:58:39,356 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-12 00:58:42,251 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.592e+01 2.864e+01 3.293e+01 5.608e+01, threshold=5.729e+01, percent-clipped=0.0 2024-08-12 00:58:44,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1376120.0, ans=0.0 2024-08-12 00:59:09,142 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7200, loss[loss=0.07404, beats_loss=0.01269, ecapa_loss=0.0002224, whisper_loss=0.05913, over 15929.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01114, ecapa_loss=0.0001852, whisper_loss=0.09195, over 3881344.16 frames. ], batch size: 70, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 00:59:12,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1376320.0, ans=0.125 2024-08-12 00:59:16,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1376320.0, ans=0.125 2024-08-12 00:59:32,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-12 00:59:36,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2024-08-12 00:59:37,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2024-08-12 00:59:53,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1376620.0, ans=0.0 2024-08-12 00:59:56,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-12 01:00:05,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1376720.0, ans=0.0 2024-08-12 01:00:07,774 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 01:00:09,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1376720.0, ans=0.1 2024-08-12 01:00:17,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7250, loss[loss=0.1182, beats_loss=0.01047, ecapa_loss=0.0002143, whisper_loss=0.1056, over 21587.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01123, ecapa_loss=0.0001844, whisper_loss=0.09155, over 3906436.49 frames. ], batch size: 90, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:00:22,270 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 01:00:36,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2024-08-12 01:00:38,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-12 01:00:58,059 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 01:00:59,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.064e+01 2.509e+01 2.818e+01 3.163e+01 4.594e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-12 01:01:05,100 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 01:01:13,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1377220.0, ans=0.0 2024-08-12 01:01:16,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1377220.0, ans=0.125 2024-08-12 01:01:22,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1377220.0, ans=0.125 2024-08-12 01:01:27,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7300, loss[loss=0.12, beats_loss=0.0107, ecapa_loss=0.0001704, whisper_loss=0.1076, over 21308.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01118, ecapa_loss=0.0001846, whisper_loss=0.09204, over 3895264.96 frames. ], batch size: 82, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:01:27,886 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.881e-02 2024-08-12 01:01:32,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1377320.0, ans=0.125 2024-08-12 01:02:12,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1377620.0, ans=0.1 2024-08-12 01:02:19,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.95 vs. limit=22.5 2024-08-12 01:02:29,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.61 vs. limit=22.5 2024-08-12 01:02:33,333 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 01:02:37,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7350, loss[loss=0.1107, beats_loss=0.01068, ecapa_loss=0.0001467, whisper_loss=0.09851, over 21114.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01117, ecapa_loss=0.0001852, whisper_loss=0.09153, over 3885945.67 frames. ], batch size: 78, lr: 6.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:02:51,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1377920.0, ans=0.0 2024-08-12 01:02:56,552 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 01:03:05,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-08-12 01:03:07,458 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 29 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-12 01:03:18,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.545e+01 2.938e+01 3.274e+01 5.414e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 01:03:22,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2024-08-12 01:03:23,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1378120.0, ans=0.0 2024-08-12 01:03:23,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1378120.0, ans=0.125 2024-08-12 01:03:40,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=12.0 2024-08-12 01:03:41,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1378220.0, ans=0.0 2024-08-12 01:03:46,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7400, loss[loss=0.1031, beats_loss=0.01075, ecapa_loss=0.0002109, whisper_loss=0.09025, over 20095.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01116, ecapa_loss=0.0001852, whisper_loss=0.09122, over 3885351.04 frames. ], batch size: 82, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:04:11,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1378420.0, ans=0.125 2024-08-12 01:04:17,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1378520.0, ans=0.0 2024-08-12 01:04:33,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1378620.0, ans=0.0 2024-08-12 01:04:44,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1378720.0, ans=0.125 2024-08-12 01:04:54,961 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7450, loss[loss=0.08348, beats_loss=0.009175, ecapa_loss=0.0002521, whisper_loss=0.07178, over 14674.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01118, ecapa_loss=0.0001851, whisper_loss=0.09171, over 3887855.94 frames. ], batch size: 61, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:04:55,328 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 01:05:10,192 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 01:05:13,219 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 01:05:26,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1379020.0, ans=0.1 2024-08-12 01:05:33,779 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 01:05:36,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.504e+01 2.763e+01 3.240e+01 5.325e+01, threshold=5.527e+01, percent-clipped=0.0 2024-08-12 01:05:50,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.69 vs. limit=22.5 2024-08-12 01:05:57,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1379220.0, ans=0.09899494936611666 2024-08-12 01:05:58,962 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 01:06:00,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1379220.0, ans=0.0 2024-08-12 01:06:04,701 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7500, loss[loss=0.1083, beats_loss=0.009468, ecapa_loss=0.0001893, whisper_loss=0.09697, over 18714.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01118, ecapa_loss=0.0001855, whisper_loss=0.09158, over 3893109.87 frames. ], batch size: 74, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:06:06,553 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:06:12,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1379320.0, ans=0.125 2024-08-12 01:06:25,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1379420.0, ans=0.125 2024-08-12 01:06:33,941 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 01:06:35,700 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 01:06:43,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.61 vs. limit=10.0 2024-08-12 01:06:52,745 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 01:07:16,617 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7550, loss[loss=0.08614, beats_loss=0.01418, ecapa_loss=0.0001829, whisper_loss=0.07013, over 22115.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01113, ecapa_loss=0.0001864, whisper_loss=0.09198, over 3895504.36 frames. ], batch size: 94, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:07:32,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1379920.0, ans=0.125 2024-08-12 01:07:34,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-12 01:07:50,950 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 01:07:59,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.522e+01 2.796e+01 3.153e+01 8.804e+01, threshold=5.592e+01, percent-clipped=1.0 2024-08-12 01:08:17,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1380220.0, ans=0.125 2024-08-12 01:08:22,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1380220.0, ans=0.0 2024-08-12 01:08:22,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1380220.0, ans=0.0 2024-08-12 01:08:28,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7600, loss[loss=0.1041, beats_loss=0.00981, ecapa_loss=0.0002301, whisper_loss=0.09197, over 20900.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01115, ecapa_loss=0.0001857, whisper_loss=0.09171, over 3881695.84 frames. ], batch size: 90, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:08:29,086 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 01:08:36,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1380320.0, ans=0.125 2024-08-12 01:08:44,024 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 01:08:53,246 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 01:09:14,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1380620.0, ans=0.1 2024-08-12 01:09:32,535 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 01:09:44,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7650, loss[loss=0.1095, beats_loss=0.01055, ecapa_loss=0.0002062, whisper_loss=0.09692, over 18565.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001857, whisper_loss=0.09229, over 3867829.64 frames. ], batch size: 74, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:09:47,608 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 17 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 01:10:07,332 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 01:10:10,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1380920.0, ans=0.0 2024-08-12 01:10:13,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1381020.0, ans=0.125 2024-08-12 01:10:19,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1381020.0, ans=0.1 2024-08-12 01:10:21,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1381020.0, ans=0.05 2024-08-12 01:10:24,872 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 01:10:30,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1381120.0, ans=0.125 2024-08-12 01:10:30,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.632e+01 2.933e+01 3.294e+01 6.262e+01, threshold=5.865e+01, percent-clipped=1.0 2024-08-12 01:10:37,564 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 01:11:01,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1381320.0, ans=0.125 2024-08-12 01:11:02,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7700, loss[loss=0.1138, beats_loss=0.01033, ecapa_loss=0.0002154, whisper_loss=0.1013, over 16398.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01104, ecapa_loss=0.0001861, whisper_loss=0.09276, over 3871692.36 frames. ], batch size: 66, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:11:10,376 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-12 01:11:10,620 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.703e-02 2024-08-12 01:11:19,284 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 01:11:22,160 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 01:11:33,361 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-12 01:11:42,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1381520.0, ans=0.95 2024-08-12 01:11:56,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-12 01:12:00,865 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 01:12:08,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1381720.0, ans=0.125 2024-08-12 01:12:10,866 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 01:12:16,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7750, loss[loss=0.1042, beats_loss=0.01107, ecapa_loss=0.000137, whisper_loss=0.09171, over 14517.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01107, ecapa_loss=0.0001846, whisper_loss=0.09248, over 3882592.30 frames. ], batch size: 55, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:12:34,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1381920.0, ans=0.0 2024-08-12 01:12:56,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1382020.0, ans=0.0 2024-08-12 01:12:57,756 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 01:12:59,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1382120.0, ans=0.125 2024-08-12 01:13:00,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 2.543e+01 2.861e+01 3.273e+01 8.260e+01, threshold=5.723e+01, percent-clipped=1.0 2024-08-12 01:13:03,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2024-08-12 01:13:21,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1382220.0, ans=0.0 2024-08-12 01:13:27,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1382220.0, ans=0.95 2024-08-12 01:13:31,310 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7800, loss[loss=0.107, beats_loss=0.009361, ecapa_loss=0.0002045, whisper_loss=0.09561, over 19625.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001823, whisper_loss=0.09213, over 3892809.99 frames. ], batch size: 79, lr: 6.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:13:31,571 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 01:13:42,935 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 01:14:04,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1382520.0, ans=0.1 2024-08-12 01:14:10,303 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 01:14:10,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1382520.0, ans=0.1 2024-08-12 01:14:18,619 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.270e+01 2024-08-12 01:14:31,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2024-08-12 01:14:38,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1382720.0, ans=0.125 2024-08-12 01:14:41,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1382720.0, ans=0.04949747468305833 2024-08-12 01:14:45,225 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7850, loss[loss=0.08794, beats_loss=0.01344, ecapa_loss=0.0001724, whisper_loss=0.07278, over 18828.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01118, ecapa_loss=0.0001819, whisper_loss=0.09275, over 3900925.07 frames. ], batch size: 77, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:14:49,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1382820.0, ans=0.0 2024-08-12 01:14:50,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1382820.0, ans=0.1 2024-08-12 01:15:09,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1382920.0, ans=0.125 2024-08-12 01:15:17,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2024-08-12 01:15:26,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1383020.0, ans=0.125 2024-08-12 01:15:29,415 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.565e+01 2.814e+01 3.165e+01 4.880e+01, threshold=5.628e+01, percent-clipped=0.0 2024-08-12 01:15:29,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1383120.0, ans=0.0 2024-08-12 01:15:42,622 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 01:15:42,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1383220.0, ans=0.0 2024-08-12 01:15:45,503 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 01:15:58,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7900, loss[loss=0.1012, beats_loss=0.01343, ecapa_loss=0.0001455, whisper_loss=0.08628, over 22737.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01118, ecapa_loss=0.0001823, whisper_loss=0.09266, over 3902940.28 frames. ], batch size: 89, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:16:03,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1383320.0, ans=0.0 2024-08-12 01:16:05,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=12.0 2024-08-12 01:16:05,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-12 01:16:06,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1383320.0, ans=0.0 2024-08-12 01:16:14,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1383420.0, ans=0.125 2024-08-12 01:16:37,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1383520.0, ans=0.125 2024-08-12 01:16:42,429 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 01:16:58,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1383720.0, ans=0.09899494936611666 2024-08-12 01:16:59,855 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 01:17:03,010 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 01:17:12,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 7950, loss[loss=0.09314, beats_loss=0.01522, ecapa_loss=0.0001559, whisper_loss=0.07637, over 18839.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01126, ecapa_loss=0.000183, whisper_loss=0.09283, over 3911064.70 frames. ], batch size: 78, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:17:13,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2024-08-12 01:17:20,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1383820.0, ans=0.2 2024-08-12 01:17:22,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1383820.0, ans=0.125 2024-08-12 01:17:28,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1383920.0, ans=0.125 2024-08-12 01:17:35,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1383920.0, ans=0.1 2024-08-12 01:17:35,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1383920.0, ans=0.0 2024-08-12 01:17:43,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1384020.0, ans=0.1 2024-08-12 01:17:57,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.551e+01 2.931e+01 3.391e+01 6.201e+01, threshold=5.862e+01, percent-clipped=1.0 2024-08-12 01:18:16,703 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 01:18:17,977 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 01:18:26,630 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8000, loss[loss=0.1138, beats_loss=0.01132, ecapa_loss=0.0001893, whisper_loss=0.1006, over 21848.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01123, ecapa_loss=0.0001834, whisper_loss=0.09294, over 3886921.38 frames. ], batch size: 90, lr: 6.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:18:50,578 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 01:18:55,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1384520.0, ans=0.125 2024-08-12 01:19:00,674 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 01:19:10,329 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 01:19:32,619 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 01:19:33,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-12 01:19:39,243 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8050, loss[loss=0.1112, beats_loss=0.009227, ecapa_loss=0.0002114, whisper_loss=0.09985, over 19212.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01127, ecapa_loss=0.0001829, whisper_loss=0.09192, over 3841731.80 frames. ], batch size: 77, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:20:01,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1384920.0, ans=0.1 2024-08-12 01:20:08,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1385020.0, ans=0.07 2024-08-12 01:20:09,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1385020.0, ans=0.1 2024-08-12 01:20:22,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.542e+01 2.903e+01 3.299e+01 4.788e+01, threshold=5.807e+01, percent-clipped=0.0 2024-08-12 01:20:44,903 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 01:20:51,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8100, loss[loss=0.09493, beats_loss=0.0114, ecapa_loss=0.0002019, whisper_loss=0.08151, over 20792.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01116, ecapa_loss=0.0001833, whisper_loss=0.09256, over 3830467.19 frames. ], batch size: 86, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:21:00,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1385320.0, ans=0.05 2024-08-12 01:21:00,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1385320.0, ans=0.2 2024-08-12 01:21:07,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1385420.0, ans=0.125 2024-08-12 01:21:18,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1385420.0, ans=0.0 2024-08-12 01:21:25,458 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 01:21:25,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1385520.0, ans=0.125 2024-08-12 01:21:25,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1385520.0, ans=0.07 2024-08-12 01:21:51,586 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-12 01:21:51,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1385720.0, ans=0.125 2024-08-12 01:21:55,515 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 01:22:00,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1385720.0, ans=0.125 2024-08-12 01:22:04,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8150, loss[loss=0.109, beats_loss=0.01104, ecapa_loss=0.0001283, whisper_loss=0.09671, over 16964.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01117, ecapa_loss=0.0001843, whisper_loss=0.09199, over 3817283.09 frames. ], batch size: 62, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:22:11,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1385820.0, ans=0.1 2024-08-12 01:22:18,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1385920.0, ans=0.2 2024-08-12 01:22:36,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1386020.0, ans=0.0 2024-08-12 01:22:42,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1386020.0, ans=0.1 2024-08-12 01:22:46,748 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-12 01:22:47,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.116e+01 2.599e+01 2.928e+01 3.345e+01 4.607e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:23:06,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1386220.0, ans=0.125 2024-08-12 01:23:08,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1386220.0, ans=0.125 2024-08-12 01:23:17,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8200, loss[loss=0.1045, beats_loss=0.01071, ecapa_loss=0.0001718, whisper_loss=0.09211, over 22678.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01119, ecapa_loss=0.0001847, whisper_loss=0.09203, over 3850284.73 frames. ], batch size: 89, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:23:28,315 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 01:23:32,732 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 01:23:41,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1386420.0, ans=0.125 2024-08-12 01:23:50,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1386520.0, ans=0.0 2024-08-12 01:23:56,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1386520.0, ans=0.05 2024-08-12 01:23:58,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1386520.0, ans=0.04949747468305833 2024-08-12 01:24:03,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1386620.0, ans=0.125 2024-08-12 01:24:07,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1386620.0, ans=0.5 2024-08-12 01:24:18,605 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 01:24:26,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1386720.0, ans=0.07 2024-08-12 01:24:28,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1386720.0, ans=0.125 2024-08-12 01:24:29,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-08-12 01:24:31,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1386820.0, ans=0.2 2024-08-12 01:24:32,319 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8250, loss[loss=0.1078, beats_loss=0.01127, ecapa_loss=0.0001928, whisper_loss=0.09456, over 17955.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0112, ecapa_loss=0.0001842, whisper_loss=0.09189, over 3858873.35 frames. ], batch size: 71, lr: 6.30e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:24:33,798 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 01:24:34,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1386820.0, ans=0.125 2024-08-12 01:24:39,523 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 10 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-12 01:24:39,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1386820.0, ans=0.125 2024-08-12 01:24:45,161 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 01:24:50,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1386920.0, ans=0.025 2024-08-12 01:25:16,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.606e+01 2.891e+01 3.345e+01 5.457e+01, threshold=5.782e+01, percent-clipped=0.0 2024-08-12 01:25:23,887 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 01:25:39,229 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 19 from LS+wenet, 24 from Vox, 50 fro AS 2024-08-12 01:25:46,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8300, loss[loss=0.08143, beats_loss=0.009313, ecapa_loss=0.0001971, whisper_loss=0.07015, over 16430.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01128, ecapa_loss=0.000184, whisper_loss=0.09066, over 3845449.81 frames. ], batch size: 66, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:26:00,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1387420.0, ans=0.125 2024-08-12 01:26:04,767 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 01:26:06,123 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-12 01:26:22,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1387520.0, ans=0.1 2024-08-12 01:26:39,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1387620.0, ans=0.2 2024-08-12 01:26:51,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1387720.0, ans=0.1 2024-08-12 01:26:57,203 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 01:27:00,802 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 01:27:02,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8350, loss[loss=0.0939, beats_loss=0.01247, ecapa_loss=0.0001649, whisper_loss=0.07979, over 19593.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01126, ecapa_loss=0.0001836, whisper_loss=0.09118, over 3857800.63 frames. ], batch size: 81, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:27:14,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1387820.0, ans=0.05 2024-08-12 01:27:25,113 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 01:27:28,326 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-12 01:27:32,052 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 01:27:35,124 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 01:27:39,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1388020.0, ans=0.0 2024-08-12 01:27:46,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=15.0 2024-08-12 01:27:47,175 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.752e+01 3.106e+01 3.684e+01 1.573e+02, threshold=6.213e+01, percent-clipped=3.0 2024-08-12 01:27:54,753 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 01:27:56,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1388120.0, ans=0.1 2024-08-12 01:28:03,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1388220.0, ans=0.05 2024-08-12 01:28:16,814 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8400, loss[loss=0.1166, beats_loss=0.0101, ecapa_loss=0.0001653, whisper_loss=0.1048, over 22617.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01114, ecapa_loss=0.0001856, whisper_loss=0.09238, over 3855559.69 frames. ], batch size: 88, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:28:22,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1388320.0, ans=0.125 2024-08-12 01:28:25,301 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 01:28:25,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1388320.0, ans=0.025 2024-08-12 01:28:26,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1388320.0, ans=0.0 2024-08-12 01:29:00,083 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 01:29:17,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1388720.0, ans=0.2 2024-08-12 01:29:29,430 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8450, loss[loss=0.1267, beats_loss=0.00869, ecapa_loss=0.0001958, whisper_loss=0.1161, over 16727.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01099, ecapa_loss=0.0001857, whisper_loss=0.09353, over 3876974.61 frames. ], batch size: 63, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:29:34,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1388820.0, ans=0.125 2024-08-12 01:29:42,635 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 01:29:43,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1388920.0, ans=0.1 2024-08-12 01:29:44,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-12 01:29:51,402 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-12 01:30:09,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1389020.0, ans=0.0 2024-08-12 01:30:12,256 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+01 2.661e+01 3.023e+01 3.413e+01 6.376e+01, threshold=6.047e+01, percent-clipped=1.0 2024-08-12 01:30:24,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1389120.0, ans=0.125 2024-08-12 01:30:27,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2024-08-12 01:30:35,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-12 01:30:35,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-12 01:30:40,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8500, loss[loss=0.0752, beats_loss=0.0158, ecapa_loss=0.0001175, whisper_loss=0.05822, over 15086.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.011, ecapa_loss=0.0001857, whisper_loss=0.09385, over 3874142.43 frames. ], batch size: 60, lr: 6.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 01:30:43,302 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 01:30:53,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1389420.0, ans=0.0 2024-08-12 01:31:04,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1389420.0, ans=0.125 2024-08-12 01:31:19,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2024-08-12 01:31:23,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1389620.0, ans=0.0 2024-08-12 01:31:33,365 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 01:31:52,517 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8550, loss[loss=0.1261, beats_loss=0.008992, ecapa_loss=0.0001796, whisper_loss=0.1153, over 23233.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.011, ecapa_loss=0.0001845, whisper_loss=0.0938, over 3864900.05 frames. ], batch size: 88, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:31:52,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1389820.0, ans=0.2 2024-08-12 01:31:53,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=12.0 2024-08-12 01:31:55,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1389820.0, ans=0.125 2024-08-12 01:32:07,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1389920.0, ans=0.0 2024-08-12 01:32:16,128 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 01:32:17,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1389920.0, ans=0.125 2024-08-12 01:32:20,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1390020.0, ans=0.2 2024-08-12 01:32:37,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.566e+01 2.875e+01 3.249e+01 7.628e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 01:32:40,195 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 01:32:47,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1390120.0, ans=0.025 2024-08-12 01:32:53,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1390220.0, ans=0.125 2024-08-12 01:32:55,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1390220.0, ans=0.0 2024-08-12 01:32:57,279 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 01:33:03,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8600, loss[loss=0.1025, beats_loss=0.01154, ecapa_loss=0.0001684, whisper_loss=0.08929, over 14354.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01107, ecapa_loss=0.0001849, whisper_loss=0.09355, over 3852012.63 frames. ], batch size: 55, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:33:09,548 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 01:33:13,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=15.0 2024-08-12 01:33:21,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1390420.0, ans=0.5 2024-08-12 01:33:27,036 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 01:33:44,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1390520.0, ans=0.0 2024-08-12 01:33:47,074 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 01:33:50,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1390620.0, ans=0.0 2024-08-12 01:33:57,143 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:34:02,444 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 01:34:03,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2024-08-12 01:34:12,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2024-08-12 01:34:14,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8650, loss[loss=0.1242, beats_loss=0.01035, ecapa_loss=0.0001701, whisper_loss=0.1121, over 24080.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01113, ecapa_loss=0.0001845, whisper_loss=0.09326, over 3891525.12 frames. ], batch size: 91, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:34:14,443 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 01:34:36,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-12 01:34:44,270 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 12 from LS+wenet, 23 from Vox, 51 fro AS 2024-08-12 01:34:54,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1391020.0, ans=0.125 2024-08-12 01:34:55,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1391120.0, ans=0.0 2024-08-12 01:34:56,529 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 01:34:57,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.624e+01 3.118e+01 3.764e+01 6.887e+01, threshold=6.237e+01, percent-clipped=2.0 2024-08-12 01:34:59,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1391120.0, ans=0.125 2024-08-12 01:35:16,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1391220.0, ans=0.125 2024-08-12 01:35:21,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-12 01:35:25,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8700, loss[loss=0.09766, beats_loss=0.009818, ecapa_loss=0.0001984, whisper_loss=0.08586, over 22615.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01105, ecapa_loss=0.000186, whisper_loss=0.0931, over 3868993.19 frames. ], batch size: 93, lr: 6.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:35:36,888 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 01:35:46,848 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 01:35:47,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1391420.0, ans=15.0 2024-08-12 01:36:16,695 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 01:36:24,317 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 01:36:39,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8750, loss[loss=0.1147, beats_loss=0.007843, ecapa_loss=0.0001763, whisper_loss=0.1051, over 19963.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001865, whisper_loss=0.09288, over 3846561.11 frames. ], batch size: 76, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:36:46,482 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 01:36:51,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1391820.0, ans=0.2 2024-08-12 01:36:51,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-12 01:37:25,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.128e+01 2.651e+01 2.928e+01 3.365e+01 6.201e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 01:37:33,041 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 01:37:40,900 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:37:53,994 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8800, loss[loss=0.1022, beats_loss=0.01076, ecapa_loss=0.0001923, whisper_loss=0.08949, over 16649.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01111, ecapa_loss=0.0001853, whisper_loss=0.09209, over 3818746.52 frames. ], batch size: 66, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:37:54,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1392320.0, ans=0.0 2024-08-12 01:38:04,968 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.608e-02 2024-08-12 01:38:14,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=12.0 2024-08-12 01:38:26,451 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.206e+00 2024-08-12 01:38:27,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1392520.0, ans=0.1 2024-08-12 01:38:36,764 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.812e-02 2024-08-12 01:38:43,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1392620.0, ans=0.125 2024-08-12 01:38:48,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1392620.0, ans=0.2 2024-08-12 01:38:51,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-08-12 01:38:55,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1392720.0, ans=0.125 2024-08-12 01:39:04,963 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 01:39:08,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8850, loss[loss=0.1249, beats_loss=0.0111, ecapa_loss=0.0001365, whisper_loss=0.1124, over 20217.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01128, ecapa_loss=0.000183, whisper_loss=0.09108, over 3809405.79 frames. ], batch size: 74, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:39:20,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-12 01:39:29,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1392920.0, ans=0.1 2024-08-12 01:39:34,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1392920.0, ans=0.5 2024-08-12 01:39:38,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1393020.0, ans=0.125 2024-08-12 01:39:40,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-08-12 01:39:53,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.070e+01 2.605e+01 2.898e+01 3.315e+01 6.590e+01, threshold=5.796e+01, percent-clipped=1.0 2024-08-12 01:39:54,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2024-08-12 01:39:58,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1393120.0, ans=0.0 2024-08-12 01:40:02,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1393120.0, ans=0.2 2024-08-12 01:40:04,441 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 01:40:12,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1393220.0, ans=0.0 2024-08-12 01:40:20,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8900, loss[loss=0.107, beats_loss=0.01136, ecapa_loss=0.0001896, whisper_loss=0.09372, over 13829.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01123, ecapa_loss=0.0001828, whisper_loss=0.09183, over 3811838.89 frames. ], batch size: 53, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:40:30,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1393320.0, ans=0.0 2024-08-12 01:40:43,459 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 01:40:47,571 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 01:40:51,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1393520.0, ans=0.0 2024-08-12 01:40:57,106 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 01:41:10,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1393620.0, ans=0.1 2024-08-12 01:41:14,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1393620.0, ans=0.0 2024-08-12 01:41:31,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 8950, loss[loss=0.1212, beats_loss=0.01085, ecapa_loss=0.000176, whisper_loss=0.1086, over 16558.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01118, ecapa_loss=0.0001833, whisper_loss=0.09199, over 3837897.18 frames. ], batch size: 63, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:41:44,510 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 01:41:54,357 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 01:42:12,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1394120.0, ans=0.125 2024-08-12 01:42:13,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.161e+01 2.694e+01 3.111e+01 3.699e+01 1.037e+02, threshold=6.222e+01, percent-clipped=1.0 2024-08-12 01:42:38,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9000, loss[loss=0.1135, beats_loss=0.008148, ecapa_loss=0.0002413, whisper_loss=0.1029, over 22770.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01115, ecapa_loss=0.0001844, whisper_loss=0.09225, over 3852167.95 frames. ], batch size: 92, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:42:38,981 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 01:43:16,640 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on ASR_libri: loss=0.2567, beats_loss=0, ecapa_loss=0.0006076, whisper_loss=0.2507, over 922467.00 frames. 2024-08-12 01:43:26,337 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.8087, 2.7479, 2.4520, 2.1679], device='cuda:0') 2024-08-12 01:43:34,734 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on SV_voxceleb1: loss=0.005114, beats_loss=0, ecapa_loss=0.0005114, whisper_loss=0, over 939242.00 frames. 2024-08-12 01:44:25,001 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3252, 1.3453, 1.7063, 1.8793], device='cuda:0') 2024-08-12 01:45:19,146 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on AT_audioset: loss=0.02463, beats_loss=0.02463, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 01:45:19,155 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 01:45:30,224 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 01:45:30,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1394320.0, ans=0.125 2024-08-12 01:46:18,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2024-08-12 01:46:22,812 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.840e-02 2024-08-12 01:46:23,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1394720.0, ans=0.0 2024-08-12 01:46:28,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9050, loss[loss=0.1043, beats_loss=0.01199, ecapa_loss=0.0001469, whisper_loss=0.09086, over 22380.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01121, ecapa_loss=0.0001838, whisper_loss=0.09174, over 3831196.97 frames. ], batch size: 87, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:46:29,085 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 01:46:33,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1394820.0, ans=0.05 2024-08-12 01:46:45,803 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 01:46:51,408 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 01:47:05,311 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 01:47:11,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.578e+01 2.935e+01 3.281e+01 5.128e+01, threshold=5.870e+01, percent-clipped=0.0 2024-08-12 01:47:17,517 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 01:47:31,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1395220.0, ans=0.125 2024-08-12 01:47:37,891 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9100, loss[loss=0.1133, beats_loss=0.01136, ecapa_loss=0.0001416, whisper_loss=0.1006, over 16741.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0111, ecapa_loss=0.0001855, whisper_loss=0.09256, over 3852411.89 frames. ], batch size: 63, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:47:38,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1395320.0, ans=0.125 2024-08-12 01:47:50,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1395420.0, ans=0.125 2024-08-12 01:47:51,322 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 01:47:57,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1395420.0, ans=0.125 2024-08-12 01:48:10,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=12.0 2024-08-12 01:48:12,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1395520.0, ans=0.125 2024-08-12 01:48:23,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1395620.0, ans=0.1 2024-08-12 01:48:26,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2024-08-12 01:48:35,584 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 01:48:36,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-08-12 01:48:43,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1395720.0, ans=0.05 2024-08-12 01:48:45,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9150, loss[loss=0.1028, beats_loss=0.01258, ecapa_loss=0.0001757, whisper_loss=0.08849, over 21495.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01112, ecapa_loss=0.0001865, whisper_loss=0.09197, over 3861343.23 frames. ], batch size: 87, lr: 6.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:48:46,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1395820.0, ans=0.125 2024-08-12 01:49:19,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1396020.0, ans=0.1 2024-08-12 01:49:28,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.582e+01 2.877e+01 3.376e+01 5.392e+01, threshold=5.754e+01, percent-clipped=0.0 2024-08-12 01:49:34,387 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 01:49:39,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1396220.0, ans=0.125 2024-08-12 01:49:40,905 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 01:49:44,504 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 01:49:45,709 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 01:49:46,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1396220.0, ans=0.125 2024-08-12 01:49:53,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9200, loss[loss=0.0985, beats_loss=0.01286, ecapa_loss=0.0001906, whisper_loss=0.08373, over 17294.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.0001864, whisper_loss=0.09198, over 3856754.20 frames. ], batch size: 73, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:50:11,152 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:50:16,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1396420.0, ans=0.0 2024-08-12 01:50:22,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-08-12 01:50:24,387 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 01:50:32,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-12 01:50:33,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1396520.0, ans=0.125 2024-08-12 01:50:45,256 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 01:50:53,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1396720.0, ans=0.1 2024-08-12 01:50:54,477 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 19 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 01:51:02,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9250, loss[loss=0.09563, beats_loss=0.01238, ecapa_loss=0.0001958, whisper_loss=0.08129, over 20121.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01115, ecapa_loss=0.000185, whisper_loss=0.0919, over 3871758.58 frames. ], batch size: 84, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:51:22,829 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 01:51:33,427 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 01:51:37,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=12.0 2024-08-12 01:51:44,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.700e+01 2.936e+01 3.310e+01 8.820e+01, threshold=5.872e+01, percent-clipped=1.0 2024-08-12 01:51:48,713 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 01:51:50,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1397120.0, ans=0.1 2024-08-12 01:52:02,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.90 vs. limit=10.0 2024-08-12 01:52:09,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1397320.0, ans=0.1 2024-08-12 01:52:10,202 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9300, loss[loss=0.1062, beats_loss=0.01223, ecapa_loss=0.0001854, whisper_loss=0.09216, over 23656.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01111, ecapa_loss=0.0001847, whisper_loss=0.09235, over 3893714.31 frames. ], batch size: 94, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:52:31,035 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 01:52:33,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1397420.0, ans=0.125 2024-08-12 01:52:44,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1397520.0, ans=0.125 2024-08-12 01:52:44,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1397520.0, ans=0.1 2024-08-12 01:52:48,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1397520.0, ans=0.2 2024-08-12 01:53:07,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1397720.0, ans=0.04949747468305833 2024-08-12 01:53:19,494 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9350, loss[loss=0.1022, beats_loss=0.01346, ecapa_loss=0.0001659, whisper_loss=0.08705, over 18032.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01115, ecapa_loss=0.0001849, whisper_loss=0.09204, over 3885161.30 frames. ], batch size: 71, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:53:24,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1397820.0, ans=0.5 2024-08-12 01:53:28,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1397820.0, ans=0.125 2024-08-12 01:53:36,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1397920.0, ans=0.125 2024-08-12 01:53:36,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1397920.0, ans=0.125 2024-08-12 01:53:37,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1397920.0, ans=0.125 2024-08-12 01:53:39,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1397920.0, ans=0.2 2024-08-12 01:53:44,698 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 01:53:46,224 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 01:53:49,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1398020.0, ans=0.1 2024-08-12 01:54:02,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.050e+01 2.487e+01 2.851e+01 3.233e+01 4.318e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-12 01:54:06,324 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.236e-01 2024-08-12 01:54:24,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1398220.0, ans=0.125 2024-08-12 01:54:24,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1398220.0, ans=0.125 2024-08-12 01:54:29,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9400, loss[loss=0.1046, beats_loss=0.01119, ecapa_loss=0.0001867, whisper_loss=0.09153, over 17933.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01118, ecapa_loss=0.0001845, whisper_loss=0.09215, over 3894910.99 frames. ], batch size: 71, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:54:47,392 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-12 01:54:54,261 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 01:55:15,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1398620.0, ans=0.0 2024-08-12 01:55:30,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1398720.0, ans=0.2 2024-08-12 01:55:38,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9450, loss[loss=0.1006, beats_loss=0.009818, ecapa_loss=0.0001975, whisper_loss=0.08883, over 22868.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01113, ecapa_loss=0.0001839, whisper_loss=0.09189, over 3877384.10 frames. ], batch size: 93, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:55:58,840 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 01:56:12,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1399020.0, ans=0.1 2024-08-12 01:56:15,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1399020.0, ans=0.125 2024-08-12 01:56:15,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1399020.0, ans=0.0 2024-08-12 01:56:20,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.626e+01 2.954e+01 3.375e+01 5.231e+01, threshold=5.908e+01, percent-clipped=0.0 2024-08-12 01:56:23,351 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 01:56:40,071 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 01:56:44,011 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 01:56:45,454 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 01:56:46,566 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9500, loss[loss=0.08271, beats_loss=0.01274, ecapa_loss=0.0002095, whisper_loss=0.06788, over 16604.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01108, ecapa_loss=0.0001839, whisper_loss=0.09261, over 3896898.04 frames. ], batch size: 71, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:57:09,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1399420.0, ans=0.5 2024-08-12 01:57:13,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1399520.0, ans=0.07 2024-08-12 01:57:18,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-08-12 01:57:41,829 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 01:57:45,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1399720.0, ans=0.0 2024-08-12 01:57:56,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9550, loss[loss=0.08449, beats_loss=0.01046, ecapa_loss=0.0002192, whisper_loss=0.07184, over 15760.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01114, ecapa_loss=0.0001848, whisper_loss=0.09181, over 3872513.97 frames. ], batch size: 69, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:57:59,065 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 01:58:03,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1399820.0, ans=22.5 2024-08-12 01:58:12,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1399920.0, ans=0.1 2024-08-12 01:58:18,737 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-140000.pt 2024-08-12 01:58:31,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-08-12 01:58:32,706 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 01:58:38,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1400120.0, ans=0.0 2024-08-12 01:58:40,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.623e+01 2.882e+01 3.186e+01 4.825e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 01:59:00,497 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.118e-02 2024-08-12 01:59:00,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1400220.0, ans=0.1 2024-08-12 01:59:00,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=3.58 vs. limit=15.0 2024-08-12 01:59:04,247 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 01:59:06,738 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9600, loss[loss=0.1, beats_loss=0.01114, ecapa_loss=0.0001933, whisper_loss=0.08692, over 23115.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01111, ecapa_loss=0.0001852, whisper_loss=0.09184, over 3868009.84 frames. ], batch size: 93, lr: 6.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 01:59:08,727 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 01:59:09,898 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 01:59:16,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1400320.0, ans=15.0 2024-08-12 01:59:22,851 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 26 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-12 01:59:38,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-12 01:59:39,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1400520.0, ans=0.0 2024-08-12 01:59:46,610 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 02:00:15,772 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 02:00:16,865 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9650, loss[loss=0.1051, beats_loss=0.01053, ecapa_loss=0.0002133, whisper_loss=0.09242, over 18270.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01103, ecapa_loss=0.0001859, whisper_loss=0.09259, over 3845719.66 frames. ], batch size: 71, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:00:19,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2024-08-12 02:00:20,855 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 02:00:23,621 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 02:00:29,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1400920.0, ans=0.125 2024-08-12 02:00:35,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1400920.0, ans=0.2 2024-08-12 02:00:45,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1401020.0, ans=0.0 2024-08-12 02:00:45,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1401020.0, ans=0.0 2024-08-12 02:00:56,114 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 02:01:00,097 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.704e+01 3.034e+01 3.483e+01 7.919e+01, threshold=6.068e+01, percent-clipped=1.0 2024-08-12 02:01:00,280 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 02:01:08,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1401120.0, ans=0.0 2024-08-12 02:01:09,665 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 02:01:22,444 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-12 02:01:26,554 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9700, loss[loss=0.1083, beats_loss=0.01111, ecapa_loss=0.0001734, whisper_loss=0.09546, over 16042.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01103, ecapa_loss=0.0001868, whisper_loss=0.09261, over 3857592.79 frames. ], batch size: 61, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:01:26,774 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-12 02:01:28,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-12 02:01:47,242 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 02:02:20,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1401620.0, ans=0.2 2024-08-12 02:02:24,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1401720.0, ans=22.5 2024-08-12 02:02:27,039 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 02:02:31,583 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 02:02:36,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9750, loss[loss=0.07947, beats_loss=0.01174, ecapa_loss=0.0002452, whisper_loss=0.06528, over 14068.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001868, whisper_loss=0.09188, over 3836707.12 frames. ], batch size: 62, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:02:37,200 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 02:02:45,663 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 02:02:50,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.20 vs. limit=22.5 2024-08-12 02:03:00,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1401920.0, ans=10.0 2024-08-12 02:03:02,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1401920.0, ans=0.2 2024-08-12 02:03:20,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.192e+01 2.664e+01 3.101e+01 3.565e+01 5.192e+01, threshold=6.201e+01, percent-clipped=0.0 2024-08-12 02:03:20,838 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 02:03:47,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9800, loss[loss=0.09993, beats_loss=0.0108, ecapa_loss=0.0001922, whisper_loss=0.0872, over 17685.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01117, ecapa_loss=0.0001851, whisper_loss=0.0916, over 3837977.31 frames. ], batch size: 70, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:03:48,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1402320.0, ans=0.07 2024-08-12 02:03:49,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1402320.0, ans=0.0 2024-08-12 02:04:16,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1402520.0, ans=0.2 2024-08-12 02:04:17,812 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 02:04:19,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1402520.0, ans=0.0 2024-08-12 02:04:34,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1402620.0, ans=0.125 2024-08-12 02:04:56,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1402720.0, ans=0.125 2024-08-12 02:04:58,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9850, loss[loss=0.1019, beats_loss=0.01289, ecapa_loss=0.0001717, whisper_loss=0.0873, over 22678.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0111, ecapa_loss=0.0001854, whisper_loss=0.09224, over 3862395.43 frames. ], batch size: 93, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:05:11,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1402920.0, ans=0.0 2024-08-12 02:05:16,925 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 02:05:22,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1402920.0, ans=0.0 2024-08-12 02:05:32,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1403020.0, ans=0.2 2024-08-12 02:05:37,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1403020.0, ans=0.125 2024-08-12 02:05:39,737 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:05:42,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.518e+01 2.832e+01 3.271e+01 6.017e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 02:05:48,009 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 02:05:58,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1403220.0, ans=0.1 2024-08-12 02:06:07,888 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 02:06:09,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9900, loss[loss=0.1285, beats_loss=0.00936, ecapa_loss=0.0001567, whisper_loss=0.1176, over 16604.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0111, ecapa_loss=0.000185, whisper_loss=0.09296, over 3910864.50 frames. ], batch size: 62, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:06:34,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1403420.0, ans=0.125 2024-08-12 02:06:58,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1403620.0, ans=0.125 2024-08-12 02:06:58,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1403620.0, ans=0.2 2024-08-12 02:07:11,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1403720.0, ans=0.2 2024-08-12 02:07:14,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1403720.0, ans=0.2 2024-08-12 02:07:15,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=1403720.0, ans=15.0 2024-08-12 02:07:20,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 9950, loss[loss=0.07801, beats_loss=0.01355, ecapa_loss=0.0001537, whisper_loss=0.06292, over 22037.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01117, ecapa_loss=0.0001849, whisper_loss=0.09207, over 3911799.67 frames. ], batch size: 89, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:07:22,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1403820.0, ans=0.0 2024-08-12 02:07:29,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1403820.0, ans=0.0 2024-08-12 02:07:45,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1403920.0, ans=0.125 2024-08-12 02:07:50,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-12 02:07:52,379 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 02:08:03,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.549e+01 2.857e+01 3.293e+01 8.751e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 02:08:06,580 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 02:08:21,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1404220.0, ans=0.1 2024-08-12 02:08:26,284 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 02:08:29,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10000, loss[loss=0.108, beats_loss=0.01021, ecapa_loss=0.0001574, whisper_loss=0.09618, over 23683.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01111, ecapa_loss=0.0001852, whisper_loss=0.09207, over 3885004.69 frames. ], batch size: 90, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:08:31,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1404320.0, ans=0.05 2024-08-12 02:08:44,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2024-08-12 02:08:50,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1404420.0, ans=0.125 2024-08-12 02:08:59,263 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 02:09:01,518 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 02:09:07,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-12 02:09:14,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1404620.0, ans=0.2 2024-08-12 02:09:16,029 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 29 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 02:09:44,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10050, loss[loss=0.1144, beats_loss=0.008908, ecapa_loss=0.0001917, whisper_loss=0.1036, over 16673.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001849, whisper_loss=0.09225, over 3896065.25 frames. ], batch size: 65, lr: 6.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:09:50,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1404820.0, ans=0.0 2024-08-12 02:09:51,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1404820.0, ans=0.125 2024-08-12 02:10:09,003 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 14 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 02:10:12,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405020.0, ans=0.1 2024-08-12 02:10:17,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1405020.0, ans=0.125 2024-08-12 02:10:27,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1405120.0, ans=0.2 2024-08-12 02:10:28,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1405120.0, ans=0.125 2024-08-12 02:10:30,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.648e+01 2.983e+01 3.418e+01 4.523e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 02:10:33,539 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 10 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 02:10:45,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1405220.0, ans=0.0 2024-08-12 02:10:47,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405220.0, ans=0.1 2024-08-12 02:11:02,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10100, loss[loss=0.1195, beats_loss=0.01043, ecapa_loss=0.0001808, whisper_loss=0.1072, over 20374.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01117, ecapa_loss=0.0001845, whisper_loss=0.09217, over 3916864.68 frames. ], batch size: 81, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:11:03,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1405320.0, ans=0.04949747468305833 2024-08-12 02:11:09,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-12 02:11:25,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1405420.0, ans=0.125 2024-08-12 02:11:39,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.80 vs. limit=6.0 2024-08-12 02:11:45,425 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 31 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 02:11:45,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1405520.0, ans=0.1 2024-08-12 02:11:56,005 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 40 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 02:12:14,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1405720.0, ans=0.0 2024-08-12 02:12:27,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10150, loss[loss=0.1181, beats_loss=0.01038, ecapa_loss=0.0001988, whisper_loss=0.1057, over 22294.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001854, whisper_loss=0.09224, over 3958044.77 frames. ], batch size: 89, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:12:27,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1405820.0, ans=0.125 2024-08-12 02:12:53,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2024-08-12 02:12:59,892 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 02:13:06,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1406020.0, ans=0.1 2024-08-12 02:13:23,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.579e+01 2.918e+01 3.241e+01 4.906e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-12 02:13:41,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=12.0 2024-08-12 02:13:49,316 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 02:13:49,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1406220.0, ans=0.0 2024-08-12 02:14:02,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1406220.0, ans=0.125 2024-08-12 02:14:05,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1406320.0, ans=0.0 2024-08-12 02:14:07,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10200, loss[loss=0.122, beats_loss=0.009509, ecapa_loss=0.0002135, whisper_loss=0.1103, over 19777.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01107, ecapa_loss=0.0001852, whisper_loss=0.09272, over 3958086.06 frames. ], batch size: 79, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:14:08,056 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 02:14:23,573 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 02:14:45,216 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 02:14:57,617 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 02:15:01,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1406520.0, ans=0.1 2024-08-12 02:15:15,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1406620.0, ans=0.04949747468305833 2024-08-12 02:15:45,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1406720.0, ans=0.125 2024-08-12 02:15:50,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1406720.0, ans=0.125 2024-08-12 02:15:54,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1406720.0, ans=0.0 2024-08-12 02:16:01,458 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10250, loss[loss=0.1033, beats_loss=0.01102, ecapa_loss=0.0001572, whisper_loss=0.09075, over 20446.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01109, ecapa_loss=0.0001842, whisper_loss=0.09316, over 3965437.82 frames. ], batch size: 78, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:16:35,938 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.728e+00 2024-08-12 02:16:41,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.72 vs. limit=10.0 2024-08-12 02:16:53,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1407020.0, ans=10.0 2024-08-12 02:16:56,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-12 02:17:04,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.107e+01 2.647e+01 2.891e+01 3.478e+01 5.936e+01, threshold=5.783e+01, percent-clipped=1.0 2024-08-12 02:17:12,257 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 02:17:13,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1407120.0, ans=0.09899494936611666 2024-08-12 02:17:16,677 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 02:17:22,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1407220.0, ans=0.125 2024-08-12 02:17:37,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1407220.0, ans=0.125 2024-08-12 02:17:42,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-08-12 02:17:43,206 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10300, loss[loss=0.133, beats_loss=0.009743, ecapa_loss=0.0001816, whisper_loss=0.1215, over 23531.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001837, whisper_loss=0.09281, over 3954125.65 frames. ], batch size: 92, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:17:59,360 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 02:18:01,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1407420.0, ans=0.125 2024-08-12 02:18:11,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1407420.0, ans=0.0 2024-08-12 02:18:16,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1407420.0, ans=0.0 2024-08-12 02:18:16,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1407420.0, ans=0.07 2024-08-12 02:18:28,809 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 02:18:35,333 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 02:18:37,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1407520.0, ans=0.125 2024-08-12 02:18:38,518 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.452e-01 2024-08-12 02:18:47,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1407620.0, ans=0.95 2024-08-12 02:18:56,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2024-08-12 02:18:58,436 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 02:19:11,396 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10350, loss[loss=0.09853, beats_loss=0.01218, ecapa_loss=0.0002148, whisper_loss=0.0842, over 21138.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01117, ecapa_loss=0.0001833, whisper_loss=0.09248, over 3951173.77 frames. ], batch size: 90, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:19:14,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1407820.0, ans=0.125 2024-08-12 02:19:23,211 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 02:19:30,119 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 02:19:41,776 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 02:19:49,478 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 02:19:56,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.600e+01 2.842e+01 3.107e+01 4.520e+01, threshold=5.684e+01, percent-clipped=0.0 2024-08-12 02:20:11,349 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 02:20:25,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10400, loss[loss=0.08611, beats_loss=0.01136, ecapa_loss=0.0001955, whisper_loss=0.07279, over 14085.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001829, whisper_loss=0.09225, over 3888909.73 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:20:28,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1408320.0, ans=0.125 2024-08-12 02:20:30,232 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 02:20:46,991 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 02:20:48,871 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 02:21:09,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1408620.0, ans=0.0 2024-08-12 02:21:27,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1408720.0, ans=0.0 2024-08-12 02:21:37,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10450, loss[loss=0.12, beats_loss=0.009615, ecapa_loss=0.0002044, whisper_loss=0.1083, over 23725.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01109, ecapa_loss=0.0001832, whisper_loss=0.09228, over 3872834.06 frames. ], batch size: 94, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:22:15,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2024-08-12 02:22:21,859 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.073e+01 2.627e+01 2.925e+01 3.348e+01 4.455e+01, threshold=5.851e+01, percent-clipped=0.0 2024-08-12 02:22:22,181 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 02:22:32,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1409120.0, ans=0.1 2024-08-12 02:22:34,758 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 02:22:38,958 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 02:22:46,869 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 02:22:49,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10500, loss[loss=0.1087, beats_loss=0.01089, ecapa_loss=0.0001971, whisper_loss=0.0958, over 21043.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001847, whisper_loss=0.09242, over 3878376.89 frames. ], batch size: 88, lr: 6.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 02:22:55,376 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 02:22:59,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-12 02:23:07,506 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.842e+02 2024-08-12 02:23:11,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1409420.0, ans=0.2 2024-08-12 02:23:28,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1409520.0, ans=0.0 2024-08-12 02:23:53,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2024-08-12 02:23:55,108 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 02:24:02,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10550, loss[loss=0.1082, beats_loss=0.01378, ecapa_loss=0.0001755, whisper_loss=0.09262, over 21702.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01107, ecapa_loss=0.000184, whisper_loss=0.09274, over 3893921.27 frames. ], batch size: 87, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:24:02,720 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 11 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-12 02:24:04,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1409820.0, ans=0.125 2024-08-12 02:24:13,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-12 02:24:31,140 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 02:24:39,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1410020.0, ans=0.125 2024-08-12 02:24:41,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1410020.0, ans=0.1 2024-08-12 02:24:41,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1410020.0, ans=0.1 2024-08-12 02:24:46,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.599e+01 2.845e+01 3.296e+01 6.744e+01, threshold=5.691e+01, percent-clipped=1.0 2024-08-12 02:24:49,613 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 02:24:53,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1410120.0, ans=0.125 2024-08-12 02:25:13,077 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10600, loss[loss=0.1371, beats_loss=0.009126, ecapa_loss=0.0001996, whisper_loss=0.126, over 19852.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01104, ecapa_loss=0.000185, whisper_loss=0.09293, over 3898818.63 frames. ], batch size: 78, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:25:32,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1410420.0, ans=0.1 2024-08-12 02:25:37,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1410420.0, ans=0.0 2024-08-12 02:25:38,615 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 30 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 02:25:55,386 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 02:26:11,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1410720.0, ans=0.0 2024-08-12 02:26:13,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1410720.0, ans=0.125 2024-08-12 02:26:14,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1410720.0, ans=0.125 2024-08-12 02:26:15,870 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 02:26:22,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10650, loss[loss=0.1281, beats_loss=0.01097, ecapa_loss=0.0001808, whisper_loss=0.1153, over 23115.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01109, ecapa_loss=0.0001831, whisper_loss=0.09267, over 3904973.91 frames. ], batch size: 90, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:26:23,915 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 02:26:32,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1410820.0, ans=0.0 2024-08-12 02:26:50,840 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 02:27:04,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.646e+01 2.959e+01 3.392e+01 4.637e+01, threshold=5.918e+01, percent-clipped=0.0 2024-08-12 02:27:07,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1411120.0, ans=0.0 2024-08-12 02:27:12,630 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 02:27:18,418 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 02:27:21,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-08-12 02:27:30,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10700, loss[loss=0.1402, beats_loss=0.007473, ecapa_loss=0.00019, whisper_loss=0.1308, over 23240.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01101, ecapa_loss=0.0001829, whisper_loss=0.09331, over 3888840.55 frames. ], batch size: 89, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:27:32,272 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 02:27:37,925 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 02:27:39,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1411320.0, ans=0.0 2024-08-12 02:27:45,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1411420.0, ans=0.0 2024-08-12 02:27:46,357 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 02:27:48,310 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.158e-02 2024-08-12 02:28:02,095 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 02:28:14,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-12 02:28:30,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1411720.0, ans=0.0 2024-08-12 02:28:33,632 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 02:28:40,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10750, loss[loss=0.1069, beats_loss=0.008299, ecapa_loss=0.0001974, whisper_loss=0.09664, over 17552.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01101, ecapa_loss=0.000184, whisper_loss=0.0938, over 3895665.23 frames. ], batch size: 69, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:28:44,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1411820.0, ans=0.125 2024-08-12 02:28:46,994 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 02:28:47,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1411820.0, ans=0.0 2024-08-12 02:28:55,571 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.979e-02 2024-08-12 02:29:00,737 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 02:29:13,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-12 02:29:22,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.596e+01 2.921e+01 3.440e+01 9.548e+01, threshold=5.843e+01, percent-clipped=1.0 2024-08-12 02:29:23,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1412120.0, ans=0.2 2024-08-12 02:29:27,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1412120.0, ans=0.2 2024-08-12 02:29:30,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1412120.0, ans=0.2 2024-08-12 02:29:39,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1412220.0, ans=0.0 2024-08-12 02:29:48,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10800, loss[loss=0.1198, beats_loss=0.01185, ecapa_loss=0.000171, whisper_loss=0.1063, over 22824.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01101, ecapa_loss=0.0001844, whisper_loss=0.09395, over 3935777.78 frames. ], batch size: 91, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:29:55,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1412320.0, ans=0.1 2024-08-12 02:30:01,435 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 02:30:04,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1412420.0, ans=0.0 2024-08-12 02:30:40,551 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 02:30:56,335 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10850, loss[loss=0.08879, beats_loss=0.01379, ecapa_loss=0.0001799, whisper_loss=0.0732, over 17840.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01104, ecapa_loss=0.0001844, whisper_loss=0.09395, over 3943151.59 frames. ], batch size: 72, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:31:04,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1412820.0, ans=0.0 2024-08-12 02:31:07,999 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-12 02:31:39,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.032e+01 2.708e+01 3.088e+01 3.544e+01 8.247e+01, threshold=6.177e+01, percent-clipped=2.0 2024-08-12 02:31:41,370 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 02:31:59,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1413220.0, ans=0.035 2024-08-12 02:32:06,802 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10900, loss[loss=0.1009, beats_loss=0.008918, ecapa_loss=0.0002168, whisper_loss=0.08979, over 16000.00 frames. ], tot_loss[loss=0.107, beats_loss=0.01104, ecapa_loss=0.0001842, whisper_loss=0.09407, over 3930553.08 frames. ], batch size: 67, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:32:07,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1413320.0, ans=0.0 2024-08-12 02:32:10,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1413320.0, ans=0.125 2024-08-12 02:32:14,565 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 40 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 02:32:26,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1413420.0, ans=0.2 2024-08-12 02:32:27,315 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 02:32:34,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1413520.0, ans=0.125 2024-08-12 02:32:39,256 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-12 02:32:43,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1413520.0, ans=0.125 2024-08-12 02:32:53,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1413620.0, ans=0.0 2024-08-12 02:32:59,101 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 02:33:06,053 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-12 02:33:06,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1413720.0, ans=0.0 2024-08-12 02:33:13,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1413720.0, ans=0.125 2024-08-12 02:33:18,392 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 10950, loss[loss=0.112, beats_loss=0.01163, ecapa_loss=0.0002105, whisper_loss=0.09823, over 20556.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01108, ecapa_loss=0.0001841, whisper_loss=0.09381, over 3935270.57 frames. ], batch size: 88, lr: 6.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:33:18,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1413820.0, ans=0.0 2024-08-12 02:33:20,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2024-08-12 02:33:39,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1413920.0, ans=0.0 2024-08-12 02:33:54,043 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 02:34:00,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.632e+01 3.025e+01 3.424e+01 7.059e+01, threshold=6.051e+01, percent-clipped=1.0 2024-08-12 02:34:04,052 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 02:34:05,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1414120.0, ans=0.125 2024-08-12 02:34:07,036 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 02:34:11,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-12 02:34:18,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1414220.0, ans=0.125 2024-08-12 02:34:26,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.33 vs. limit=10.0 2024-08-12 02:34:27,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11000, loss[loss=0.1303, beats_loss=0.008831, ecapa_loss=0.0001965, whisper_loss=0.1195, over 18012.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01105, ecapa_loss=0.0001844, whisper_loss=0.0939, over 3933496.23 frames. ], batch size: 74, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:34:33,291 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 29 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 02:34:39,004 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 02:35:13,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1414620.0, ans=0.125 2024-08-12 02:35:14,055 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 02:35:22,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1414720.0, ans=0.125 2024-08-12 02:35:24,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1414720.0, ans=0.2 2024-08-12 02:35:35,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11050, loss[loss=0.1161, beats_loss=0.009433, ecapa_loss=0.0002173, whisper_loss=0.1045, over 21515.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01103, ecapa_loss=0.000184, whisper_loss=0.09366, over 3909156.33 frames. ], batch size: 89, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:35:56,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1414920.0, ans=0.0 2024-08-12 02:36:06,081 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 02:36:18,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.531e+01 2.878e+01 3.285e+01 6.916e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 02:36:33,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1415220.0, ans=0.0 2024-08-12 02:36:37,131 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 02:36:45,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11100, loss[loss=0.1172, beats_loss=0.0095, ecapa_loss=0.000176, whisper_loss=0.106, over 23279.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.011, ecapa_loss=0.0001846, whisper_loss=0.09344, over 3911703.73 frames. ], batch size: 90, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:36:53,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1415320.0, ans=0.1 2024-08-12 02:37:17,382 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 02:37:24,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2024-08-12 02:37:26,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1415520.0, ans=0.125 2024-08-12 02:37:46,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1415720.0, ans=0.1 2024-08-12 02:37:49,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-12 02:37:56,012 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11150, loss[loss=0.07261, beats_loss=0.01083, ecapa_loss=0.0002148, whisper_loss=0.05963, over 15209.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01096, ecapa_loss=0.000185, whisper_loss=0.09354, over 3893759.66 frames. ], batch size: 61, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:38:10,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2024-08-12 02:38:39,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.570e+01 2.845e+01 3.196e+01 4.459e+01, threshold=5.690e+01, percent-clipped=0.0 2024-08-12 02:38:42,496 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 02:38:58,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1416220.0, ans=0.125 2024-08-12 02:39:06,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11200, loss[loss=0.102, beats_loss=0.01175, ecapa_loss=0.0002077, whisper_loss=0.0882, over 21298.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01099, ecapa_loss=0.0001838, whisper_loss=0.09318, over 3886784.23 frames. ], batch size: 89, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:39:11,052 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 02:39:12,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1416320.0, ans=0.0 2024-08-12 02:39:18,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1416320.0, ans=0.0 2024-08-12 02:39:18,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.95 vs. limit=10.0 2024-08-12 02:40:03,405 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.556e-01 2024-08-12 02:40:12,565 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-12 02:40:12,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1416720.0, ans=0.0 2024-08-12 02:40:16,486 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11250, loss[loss=0.1032, beats_loss=0.01269, ecapa_loss=0.0001675, whisper_loss=0.0888, over 17682.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01092, ecapa_loss=0.0001839, whisper_loss=0.09416, over 3889783.01 frames. ], batch size: 71, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:40:18,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1416820.0, ans=0.125 2024-08-12 02:40:24,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.08 vs. limit=10.0 2024-08-12 02:40:25,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1416820.0, ans=0.2 2024-08-12 02:40:25,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1416820.0, ans=0.1 2024-08-12 02:40:26,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1416820.0, ans=0.0 2024-08-12 02:40:38,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-08-12 02:40:45,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1417020.0, ans=0.125 2024-08-12 02:40:56,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1417120.0, ans=0.125 2024-08-12 02:40:59,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.698e+01 3.076e+01 3.539e+01 6.948e+01, threshold=6.153e+01, percent-clipped=1.0 2024-08-12 02:40:59,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1417120.0, ans=0.0 2024-08-12 02:41:13,851 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 02:41:21,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1417220.0, ans=0.025 2024-08-12 02:41:25,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11300, loss[loss=0.09998, beats_loss=0.01445, ecapa_loss=0.0001169, whisper_loss=0.08436, over 18369.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01087, ecapa_loss=0.000184, whisper_loss=0.09411, over 3889392.24 frames. ], batch size: 69, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:41:32,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1417320.0, ans=0.125 2024-08-12 02:41:33,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1417320.0, ans=0.025 2024-08-12 02:41:35,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1417320.0, ans=0.1 2024-08-12 02:41:43,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1417420.0, ans=0.2 2024-08-12 02:41:49,069 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 02:41:54,794 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 02:42:07,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1417620.0, ans=0.125 2024-08-12 02:42:12,839 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 02:42:26,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1417720.0, ans=0.0 2024-08-12 02:42:28,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.74 vs. limit=22.5 2024-08-12 02:42:35,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11350, loss[loss=0.1073, beats_loss=0.01139, ecapa_loss=0.0001874, whisper_loss=0.09408, over 22219.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.011, ecapa_loss=0.0001834, whisper_loss=0.09291, over 3895720.68 frames. ], batch size: 92, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:42:43,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-08-12 02:43:18,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.545e+01 2.820e+01 3.202e+01 5.315e+01, threshold=5.639e+01, percent-clipped=0.0 2024-08-12 02:43:26,775 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 02:43:37,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1418220.0, ans=0.0 2024-08-12 02:43:37,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1418220.0, ans=0.2 2024-08-12 02:43:38,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1418220.0, ans=0.0 2024-08-12 02:43:45,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11400, loss[loss=0.1198, beats_loss=0.01087, ecapa_loss=0.0001911, whisper_loss=0.107, over 23008.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01101, ecapa_loss=0.0001843, whisper_loss=0.0927, over 3854141.83 frames. ], batch size: 88, lr: 6.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:43:58,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1418420.0, ans=0.1 2024-08-12 02:44:13,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2024-08-12 02:44:20,830 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 02:44:27,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1418620.0, ans=0.125 2024-08-12 02:44:51,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1418720.0, ans=0.0 2024-08-12 02:44:52,244 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 02:44:53,370 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11450, loss[loss=0.1005, beats_loss=0.01008, ecapa_loss=0.0002109, whisper_loss=0.08833, over 14804.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01101, ecapa_loss=0.000185, whisper_loss=0.09247, over 3872375.98 frames. ], batch size: 61, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:45:14,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1418920.0, ans=0.1 2024-08-12 02:45:17,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1418920.0, ans=0.0 2024-08-12 02:45:33,703 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 02:45:36,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.629e+01 3.024e+01 3.484e+01 5.992e+01, threshold=6.048e+01, percent-clipped=1.0 2024-08-12 02:45:40,943 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 02:45:53,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1419220.0, ans=0.125 2024-08-12 02:46:01,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1419320.0, ans=0.0 2024-08-12 02:46:02,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11500, loss[loss=0.09499, beats_loss=0.01394, ecapa_loss=0.0001796, whisper_loss=0.07925, over 22406.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01104, ecapa_loss=0.0001844, whisper_loss=0.09286, over 3881110.46 frames. ], batch size: 94, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:46:09,290 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 02:46:12,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1419320.0, ans=0.05 2024-08-12 02:46:24,171 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-12 02:46:36,348 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 02:46:48,787 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 02:46:58,467 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 02:47:07,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1419720.0, ans=0.125 2024-08-12 02:47:11,206 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11550, loss[loss=0.08925, beats_loss=0.009852, ecapa_loss=0.0002079, whisper_loss=0.07732, over 15542.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01103, ecapa_loss=0.0001851, whisper_loss=0.09309, over 3891911.84 frames. ], batch size: 64, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:47:38,899 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 02:47:39,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1420020.0, ans=0.2 2024-08-12 02:47:51,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-12 02:47:53,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.668e+01 3.016e+01 3.497e+01 6.036e+01, threshold=6.031e+01, percent-clipped=0.0 2024-08-12 02:48:20,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11600, loss[loss=0.0992, beats_loss=0.01018, ecapa_loss=0.0001944, whisper_loss=0.08708, over 14188.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01103, ecapa_loss=0.0001845, whisper_loss=0.09307, over 3878200.84 frames. ], batch size: 56, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:48:30,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1420320.0, ans=0.125 2024-08-12 02:49:08,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1420620.0, ans=0.125 2024-08-12 02:49:18,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1420720.0, ans=0.125 2024-08-12 02:49:18,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1420720.0, ans=0.125 2024-08-12 02:49:27,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1420720.0, ans=0.125 2024-08-12 02:49:27,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1420720.0, ans=0.125 2024-08-12 02:49:28,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1420820.0, ans=0.0 2024-08-12 02:49:29,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11650, loss[loss=0.1009, beats_loss=0.01158, ecapa_loss=0.0002291, whisper_loss=0.08703, over 22236.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01096, ecapa_loss=0.0001857, whisper_loss=0.09313, over 3899762.39 frames. ], batch size: 92, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:49:41,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1420820.0, ans=0.09899494936611666 2024-08-12 02:49:44,752 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 02:49:47,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1420920.0, ans=0.125 2024-08-12 02:49:56,960 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 02:50:06,755 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 02:50:09,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1421020.0, ans=15.0 2024-08-12 02:50:12,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.632e+01 2.905e+01 3.202e+01 4.413e+01, threshold=5.810e+01, percent-clipped=0.0 2024-08-12 02:50:15,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-12 02:50:17,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1421120.0, ans=0.0 2024-08-12 02:50:24,636 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 02:50:29,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-08-12 02:50:38,283 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11700, loss[loss=0.07512, beats_loss=0.01251, ecapa_loss=0.0002404, whisper_loss=0.0602, over 15083.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01105, ecapa_loss=0.0001856, whisper_loss=0.09242, over 3919728.32 frames. ], batch size: 67, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:50:53,170 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 02:50:54,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1421420.0, ans=0.1 2024-08-12 02:51:06,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1421520.0, ans=0.0 2024-08-12 02:51:15,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1421520.0, ans=0.1 2024-08-12 02:51:23,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1421620.0, ans=15.0 2024-08-12 02:51:41,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1421720.0, ans=0.2 2024-08-12 02:51:43,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1421720.0, ans=0.125 2024-08-12 02:51:44,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1421720.0, ans=0.125 2024-08-12 02:51:46,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11750, loss[loss=0.101, beats_loss=0.01271, ecapa_loss=0.0001541, whisper_loss=0.08676, over 23503.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01111, ecapa_loss=0.0001864, whisper_loss=0.09293, over 3931274.55 frames. ], batch size: 91, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:51:50,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1421820.0, ans=0.125 2024-08-12 02:51:55,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1421820.0, ans=0.125 2024-08-12 02:52:00,827 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 02:52:07,159 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 02:52:19,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2024-08-12 02:52:29,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.531e+01 2.844e+01 3.355e+01 7.826e+01, threshold=5.688e+01, percent-clipped=1.0 2024-08-12 02:52:36,427 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 02:52:46,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=12.0 2024-08-12 02:52:55,163 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11800, loss[loss=0.1184, beats_loss=0.0104, ecapa_loss=0.0002132, whisper_loss=0.1058, over 21683.00 frames. ], tot_loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0001872, whisper_loss=0.09301, over 3928948.03 frames. ], batch size: 89, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:53:06,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1422320.0, ans=0.125 2024-08-12 02:53:17,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1422420.0, ans=0.2 2024-08-12 02:53:19,959 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 20 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 02:53:24,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1422520.0, ans=0.125 2024-08-12 02:53:27,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1422520.0, ans=0.125 2024-08-12 02:53:28,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1422520.0, ans=10.0 2024-08-12 02:53:38,533 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 02:53:58,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1422720.0, ans=0.0 2024-08-12 02:54:04,589 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11850, loss[loss=0.1072, beats_loss=0.01052, ecapa_loss=0.0002221, whisper_loss=0.09442, over 18652.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01109, ecapa_loss=0.0001876, whisper_loss=0.09318, over 3946194.77 frames. ], batch size: 78, lr: 6.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:54:09,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=12.0 2024-08-12 02:54:22,617 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 02:54:27,989 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 02:54:47,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.632e+01 2.955e+01 3.333e+01 2.077e+02, threshold=5.910e+01, percent-clipped=1.0 2024-08-12 02:55:12,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11900, loss[loss=0.101, beats_loss=0.01398, ecapa_loss=0.0001808, whisper_loss=0.08517, over 21927.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.01105, ecapa_loss=0.0001872, whisper_loss=0.09371, over 3964193.22 frames. ], batch size: 93, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:55:41,662 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 02:55:41,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1423520.0, ans=0.125 2024-08-12 02:56:07,405 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 16 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-12 02:56:22,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 11950, loss[loss=0.1344, beats_loss=0.009191, ecapa_loss=0.0001534, whisper_loss=0.1237, over 24007.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01102, ecapa_loss=0.0001865, whisper_loss=0.09334, over 3932275.63 frames. ], batch size: 90, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:56:26,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.59 vs. limit=10.0 2024-08-12 02:56:32,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1423820.0, ans=0.125 2024-08-12 02:56:33,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1423820.0, ans=0.2 2024-08-12 02:56:47,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1423920.0, ans=0.0 2024-08-12 02:56:57,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-12 02:57:05,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1424120.0, ans=0.0 2024-08-12 02:57:06,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.496e+01 2.723e+01 3.288e+01 6.365e+01, threshold=5.445e+01, percent-clipped=1.0 2024-08-12 02:57:14,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1424120.0, ans=0.2 2024-08-12 02:57:18,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1424220.0, ans=0.0 2024-08-12 02:57:25,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2024-08-12 02:57:26,522 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.585e-01 2024-08-12 02:57:26,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2024-08-12 02:57:31,485 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12000, loss[loss=0.1009, beats_loss=0.01362, ecapa_loss=0.0001503, whisper_loss=0.08575, over 20518.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01121, ecapa_loss=0.0001858, whisper_loss=0.09129, over 3894807.71 frames. ], batch size: 83, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 02:57:31,486 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 02:58:10,712 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006161, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 02:58:28,838 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on SV_voxceleb1: loss=0.005027, beats_loss=0, ecapa_loss=0.0005027, whisper_loss=0, over 939242.00 frames. 2024-08-12 03:00:26,463 INFO [train_multi_KD3.py:1149] (0/4) Epoch 10, validation on AT_audioset: loss=0.02469, beats_loss=0.02469, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 03:00:26,467 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 03:00:54,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1424520.0, ans=0.07 2024-08-12 03:00:59,584 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 03:00:59,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1424520.0, ans=0.1 2024-08-12 03:01:09,209 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 03:01:12,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2024-08-12 03:01:14,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=22.5 2024-08-12 03:01:30,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-08-12 03:01:31,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=12.0 2024-08-12 03:01:36,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12050, loss[loss=0.1048, beats_loss=0.01179, ecapa_loss=0.0001834, whisper_loss=0.09118, over 23211.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01133, ecapa_loss=0.0001848, whisper_loss=0.09084, over 3887093.10 frames. ], batch size: 92, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:01:43,411 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 03:02:11,160 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 03:02:11,541 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.900e-01 2024-08-12 03:02:15,343 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 03:02:19,396 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.644e+01 2.915e+01 3.248e+01 4.728e+01, threshold=5.830e+01, percent-clipped=0.0 2024-08-12 03:02:24,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1425120.0, ans=0.125 2024-08-12 03:02:25,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2024-08-12 03:02:27,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.80 vs. limit=15.0 2024-08-12 03:02:42,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1425220.0, ans=0.1 2024-08-12 03:02:45,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12100, loss[loss=0.09922, beats_loss=0.01037, ecapa_loss=0.0002429, whisper_loss=0.08642, over 18452.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01127, ecapa_loss=0.000186, whisper_loss=0.09129, over 3887053.23 frames. ], batch size: 81, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:02:51,431 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 03:02:53,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1425320.0, ans=0.025 2024-08-12 03:03:03,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1425420.0, ans=0.2 2024-08-12 03:03:05,608 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-12 03:03:12,397 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.068e-01 2024-08-12 03:03:24,421 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-12 03:03:51,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1425720.0, ans=0.0 2024-08-12 03:03:54,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12150, loss[loss=0.1171, beats_loss=0.009779, ecapa_loss=0.0002348, whisper_loss=0.1049, over 21991.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01126, ecapa_loss=0.0001871, whisper_loss=0.09123, over 3878804.18 frames. ], batch size: 90, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:04:38,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.674e+01 3.067e+01 3.443e+01 6.340e+01, threshold=6.135e+01, percent-clipped=1.0 2024-08-12 03:04:42,447 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 03:04:51,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=8.0 2024-08-12 03:04:54,772 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 03:05:02,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1426220.0, ans=0.0 2024-08-12 03:05:04,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12200, loss[loss=0.1184, beats_loss=0.009256, ecapa_loss=0.0002188, whisper_loss=0.1069, over 21232.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0112, ecapa_loss=0.0001867, whisper_loss=0.09197, over 3916839.15 frames. ], batch size: 84, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:05:20,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1426420.0, ans=0.0 2024-08-12 03:05:38,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1426520.0, ans=0.2 2024-08-12 03:05:38,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1426520.0, ans=0.0 2024-08-12 03:05:38,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1426520.0, ans=0.125 2024-08-12 03:05:45,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1426620.0, ans=0.0 2024-08-12 03:05:49,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1426620.0, ans=0.1 2024-08-12 03:05:52,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1426620.0, ans=0.125 2024-08-12 03:05:59,598 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.761e-01 2024-08-12 03:06:01,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1426720.0, ans=0.1 2024-08-12 03:06:13,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12250, loss[loss=0.1001, beats_loss=0.008417, ecapa_loss=0.0001992, whisper_loss=0.08969, over 13448.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01113, ecapa_loss=0.0001855, whisper_loss=0.09226, over 3913822.19 frames. ], batch size: 55, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:06:19,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1426820.0, ans=0.1 2024-08-12 03:06:31,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1426920.0, ans=0.125 2024-08-12 03:06:56,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.672e+01 2.930e+01 3.249e+01 5.324e+01, threshold=5.861e+01, percent-clipped=0.0 2024-08-12 03:07:02,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1427120.0, ans=0.0 2024-08-12 03:07:16,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1427220.0, ans=0.5 2024-08-12 03:07:23,274 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12300, loss[loss=0.1164, beats_loss=0.009293, ecapa_loss=0.000249, whisper_loss=0.1046, over 14805.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001866, whisper_loss=0.09235, over 3889184.33 frames. ], batch size: 61, lr: 6.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:07:33,429 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 03:07:34,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1427320.0, ans=0.07 2024-08-12 03:07:40,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1427420.0, ans=0.0 2024-08-12 03:07:40,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2024-08-12 03:07:54,864 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 03:08:00,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1427520.0, ans=0.125 2024-08-12 03:08:13,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1427620.0, ans=0.0 2024-08-12 03:08:23,551 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 03:08:23,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1427720.0, ans=0.125 2024-08-12 03:08:32,613 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12350, loss[loss=0.1088, beats_loss=0.009403, ecapa_loss=0.000198, whisper_loss=0.09738, over 17858.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01118, ecapa_loss=0.0001866, whisper_loss=0.09223, over 3895362.41 frames. ], batch size: 72, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:08:32,853 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 03:09:00,876 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 03:09:16,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1428120.0, ans=0.125 2024-08-12 03:09:18,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.674e+01 3.021e+01 3.383e+01 7.125e+01, threshold=6.043e+01, percent-clipped=2.0 2024-08-12 03:09:29,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1428120.0, ans=0.1 2024-08-12 03:09:44,808 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 03:09:47,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12400, loss[loss=0.1255, beats_loss=0.008441, ecapa_loss=0.000193, whisper_loss=0.1151, over 23239.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01109, ecapa_loss=0.0001858, whisper_loss=0.09273, over 3917472.06 frames. ], batch size: 92, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:09:48,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1428320.0, ans=0.125 2024-08-12 03:10:10,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1428420.0, ans=0.125 2024-08-12 03:10:29,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1428520.0, ans=0.09899494936611666 2024-08-12 03:10:32,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1428620.0, ans=0.05 2024-08-12 03:10:48,961 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 03:10:52,462 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 03:10:54,138 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:10:56,691 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 03:11:01,404 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 03:11:02,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12450, loss[loss=0.1144, beats_loss=0.009882, ecapa_loss=0.0001712, whisper_loss=0.1028, over 17346.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.0001848, whisper_loss=0.09217, over 3901581.38 frames. ], batch size: 67, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:11:14,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1428820.0, ans=0.125 2024-08-12 03:11:15,858 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 03:11:17,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1428920.0, ans=0.125 2024-08-12 03:11:36,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1429020.0, ans=0.125 2024-08-12 03:11:46,535 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+01 2.502e+01 2.764e+01 3.282e+01 5.590e+01, threshold=5.528e+01, percent-clipped=0.0 2024-08-12 03:11:55,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1429120.0, ans=0.125 2024-08-12 03:12:05,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1429220.0, ans=0.2 2024-08-12 03:12:05,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1429220.0, ans=0.125 2024-08-12 03:12:13,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1429320.0, ans=0.1 2024-08-12 03:12:14,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12500, loss[loss=0.0845, beats_loss=0.01053, ecapa_loss=0.0001955, whisper_loss=0.07201, over 21005.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01098, ecapa_loss=0.0001855, whisper_loss=0.09276, over 3907270.37 frames. ], batch size: 88, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:12:16,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1429320.0, ans=0.0 2024-08-12 03:12:20,930 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 03:12:26,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1429320.0, ans=0.0 2024-08-12 03:12:29,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-12 03:12:32,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1429420.0, ans=0.125 2024-08-12 03:12:33,155 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 03:12:52,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1429520.0, ans=0.09899494936611666 2024-08-12 03:13:04,391 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 03:13:10,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2024-08-12 03:13:19,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1429720.0, ans=0.0 2024-08-12 03:13:22,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1429720.0, ans=0.04949747468305833 2024-08-12 03:13:27,066 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12550, loss[loss=0.1035, beats_loss=0.01099, ecapa_loss=0.0001869, whisper_loss=0.09069, over 15278.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01105, ecapa_loss=0.0001845, whisper_loss=0.09272, over 3904533.64 frames. ], batch size: 60, lr: 6.20e-03, grad_scale: 2.305843009213694e+18 2024-08-12 03:13:29,951 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 03:13:46,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=12.0 2024-08-12 03:13:59,529 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 03:14:12,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+01 2.663e+01 2.938e+01 3.317e+01 5.229e+01, threshold=5.876e+01, percent-clipped=0.0 2024-08-12 03:14:22,962 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 03:14:28,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1430220.0, ans=0.125 2024-08-12 03:14:28,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1430220.0, ans=0.025 2024-08-12 03:14:34,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1430220.0, ans=0.0 2024-08-12 03:14:38,727 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12600, loss[loss=0.1054, beats_loss=0.01196, ecapa_loss=0.0001786, whisper_loss=0.09166, over 22465.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01111, ecapa_loss=0.0001836, whisper_loss=0.09252, over 3907809.32 frames. ], batch size: 91, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:14:46,696 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 03:14:53,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1430420.0, ans=0.0 2024-08-12 03:14:55,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1430420.0, ans=0.1 2024-08-12 03:14:57,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1430420.0, ans=0.125 2024-08-12 03:15:02,448 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 03:15:18,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1430520.0, ans=0.2 2024-08-12 03:15:24,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1430620.0, ans=0.1 2024-08-12 03:15:30,501 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 03:15:33,109 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 03:15:41,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1430720.0, ans=0.0 2024-08-12 03:15:52,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12650, loss[loss=0.1065, beats_loss=0.01075, ecapa_loss=0.0002223, whisper_loss=0.09355, over 17898.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01118, ecapa_loss=0.000184, whisper_loss=0.09203, over 3891344.02 frames. ], batch size: 73, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:15:57,157 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 03:15:57,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1430820.0, ans=0.125 2024-08-12 03:16:06,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1430920.0, ans=0.0 2024-08-12 03:16:09,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1430920.0, ans=0.125 2024-08-12 03:16:14,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1430920.0, ans=0.1 2024-08-12 03:16:33,003 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 03:16:33,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1431020.0, ans=0.07 2024-08-12 03:16:37,551 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 03:16:38,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.672e+01 3.119e+01 3.630e+01 6.657e+01, threshold=6.239e+01, percent-clipped=2.0 2024-08-12 03:16:49,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1431120.0, ans=0.125 2024-08-12 03:16:50,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1431220.0, ans=0.2 2024-08-12 03:16:58,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-08-12 03:17:05,575 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12700, loss[loss=0.1186, beats_loss=0.0106, ecapa_loss=0.0001474, whisper_loss=0.1065, over 17800.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01119, ecapa_loss=0.000184, whisper_loss=0.09155, over 3863314.35 frames. ], batch size: 67, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:17:21,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1431420.0, ans=0.125 2024-08-12 03:17:27,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1431420.0, ans=0.1 2024-08-12 03:17:49,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1431620.0, ans=0.2 2024-08-12 03:18:00,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1431620.0, ans=0.1 2024-08-12 03:18:06,713 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 03:18:18,312 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12750, loss[loss=0.1052, beats_loss=0.009212, ecapa_loss=0.000252, whisper_loss=0.09348, over 13940.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01122, ecapa_loss=0.0001832, whisper_loss=0.0923, over 3884072.54 frames. ], batch size: 60, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:18:29,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1431820.0, ans=0.0 2024-08-12 03:18:45,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1432020.0, ans=0.125 2024-08-12 03:19:02,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.558e+01 2.840e+01 3.489e+01 4.506e+01, threshold=5.680e+01, percent-clipped=0.0 2024-08-12 03:19:10,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1432120.0, ans=0.125 2024-08-12 03:19:29,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12800, loss[loss=0.1104, beats_loss=0.0114, ecapa_loss=0.0002115, whisper_loss=0.09691, over 21449.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01128, ecapa_loss=0.0001851, whisper_loss=0.09212, over 3899359.66 frames. ], batch size: 90, lr: 6.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:19:31,822 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 03:19:38,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1432320.0, ans=0.07 2024-08-12 03:19:45,300 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-12 03:20:12,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1432620.0, ans=0.125 2024-08-12 03:20:13,189 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 03:20:24,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-08-12 03:20:39,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12850, loss[loss=0.1212, beats_loss=0.009709, ecapa_loss=0.0001691, whisper_loss=0.1098, over 17035.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01137, ecapa_loss=0.000184, whisper_loss=0.09176, over 3885937.18 frames. ], batch size: 67, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:20:39,631 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 03:20:41,247 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 03:20:41,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1432820.0, ans=0.0 2024-08-12 03:21:07,486 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 03:21:09,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.96 vs. limit=22.5 2024-08-12 03:21:15,443 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 36 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 03:21:22,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1433120.0, ans=0.0 2024-08-12 03:21:23,442 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.483e+01 2.799e+01 3.175e+01 4.760e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-12 03:21:34,983 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-12 03:21:44,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=22.5 2024-08-12 03:21:45,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1433220.0, ans=0.1 2024-08-12 03:21:48,363 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12900, loss[loss=0.1281, beats_loss=0.009228, ecapa_loss=0.0001614, whisper_loss=0.1173, over 18522.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01134, ecapa_loss=0.0001848, whisper_loss=0.09141, over 3881931.65 frames. ], batch size: 72, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:21:48,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1433320.0, ans=0.0 2024-08-12 03:21:55,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1433320.0, ans=0.125 2024-08-12 03:22:07,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1433420.0, ans=0.125 2024-08-12 03:22:10,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=15.0 2024-08-12 03:22:12,791 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 03:22:58,935 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 12950, loss[loss=0.09924, beats_loss=0.009772, ecapa_loss=0.0002265, whisper_loss=0.08721, over 18737.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01124, ecapa_loss=0.0001847, whisper_loss=0.09176, over 3863582.07 frames. ], batch size: 79, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:23:12,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1433920.0, ans=0.125 2024-08-12 03:23:19,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1433920.0, ans=0.125 2024-08-12 03:23:20,976 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 03:23:21,311 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:23:22,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1433920.0, ans=0.0 2024-08-12 03:23:24,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1433920.0, ans=0.0 2024-08-12 03:23:25,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1433920.0, ans=0.025 2024-08-12 03:23:27,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1434020.0, ans=0.1 2024-08-12 03:23:32,233 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 03:23:33,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1434020.0, ans=0.0 2024-08-12 03:23:45,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.584e+01 3.018e+01 3.555e+01 5.734e+01, threshold=6.036e+01, percent-clipped=3.0 2024-08-12 03:24:01,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1434220.0, ans=0.5 2024-08-12 03:24:11,209 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13000, loss[loss=0.1072, beats_loss=0.01054, ecapa_loss=0.0002389, whisper_loss=0.09424, over 20904.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01124, ecapa_loss=0.0001855, whisper_loss=0.09151, over 3858915.62 frames. ], batch size: 89, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:24:32,137 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 03:24:43,832 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 03:24:44,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1434520.0, ans=0.0 2024-08-12 03:24:45,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1434520.0, ans=0.1 2024-08-12 03:25:02,399 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 03:25:05,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1434620.0, ans=0.125 2024-08-12 03:25:10,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1434720.0, ans=0.0 2024-08-12 03:25:11,296 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 03:25:24,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13050, loss[loss=0.1166, beats_loss=0.009524, ecapa_loss=0.0001546, whisper_loss=0.1055, over 17259.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01119, ecapa_loss=0.0001846, whisper_loss=0.09162, over 3866352.85 frames. ], batch size: 63, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:25:42,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1434920.0, ans=0.2 2024-08-12 03:25:44,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.05 vs. limit=6.0 2024-08-12 03:25:50,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-08-12 03:25:52,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1435020.0, ans=0.125 2024-08-12 03:25:56,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1435020.0, ans=0.0 2024-08-12 03:26:02,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1435020.0, ans=0.2 2024-08-12 03:26:05,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1435020.0, ans=0.125 2024-08-12 03:26:05,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-08-12 03:26:12,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+01 2.574e+01 2.930e+01 3.375e+01 4.949e+01, threshold=5.859e+01, percent-clipped=0.0 2024-08-12 03:26:31,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-12 03:26:37,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1435220.0, ans=0.0 2024-08-12 03:26:40,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1435320.0, ans=0.1 2024-08-12 03:26:41,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13100, loss[loss=0.1019, beats_loss=0.01315, ecapa_loss=0.0001546, whisper_loss=0.08717, over 23540.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01121, ecapa_loss=0.0001835, whisper_loss=0.09177, over 3878201.89 frames. ], batch size: 94, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:26:45,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1435320.0, ans=0.2 2024-08-12 03:26:45,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1435320.0, ans=0.125 2024-08-12 03:26:49,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1435320.0, ans=0.125 2024-08-12 03:26:59,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1435420.0, ans=0.125 2024-08-12 03:27:13,462 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 03:27:31,313 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 35 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 03:27:47,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.68 vs. limit=10.0 2024-08-12 03:27:52,348 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 03:27:54,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1435720.0, ans=0.0 2024-08-12 03:27:56,458 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13150, loss[loss=0.1477, beats_loss=0.007128, ecapa_loss=0.000183, whisper_loss=0.1388, over 20798.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01118, ecapa_loss=0.0001838, whisper_loss=0.09163, over 3880340.04 frames. ], batch size: 76, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:27:58,435 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 03:28:16,129 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-12 03:28:16,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1435920.0, ans=0.125 2024-08-12 03:28:18,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-12 03:28:19,163 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 03:28:38,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-12 03:28:43,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+01 2.467e+01 2.835e+01 3.173e+01 4.953e+01, threshold=5.670e+01, percent-clipped=0.0 2024-08-12 03:28:43,361 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 33 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-12 03:28:46,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1436120.0, ans=0.0 2024-08-12 03:29:02,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.85 vs. limit=10.0 2024-08-12 03:29:06,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1436220.0, ans=0.0 2024-08-12 03:29:07,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1436320.0, ans=0.0 2024-08-12 03:29:07,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1436320.0, ans=0.04949747468305833 2024-08-12 03:29:08,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13200, loss[loss=0.09723, beats_loss=0.01026, ecapa_loss=0.0002088, whisper_loss=0.08488, over 22568.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01114, ecapa_loss=0.0001827, whisper_loss=0.09199, over 3869572.43 frames. ], batch size: 92, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:29:13,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1436320.0, ans=0.0 2024-08-12 03:29:15,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1436320.0, ans=0.125 2024-08-12 03:29:15,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1436320.0, ans=0.0 2024-08-12 03:29:25,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1436420.0, ans=0.125 2024-08-12 03:29:36,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1436520.0, ans=0.125 2024-08-12 03:29:39,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1436520.0, ans=0.125 2024-08-12 03:30:02,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-12 03:30:22,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13250, loss[loss=0.1005, beats_loss=0.01426, ecapa_loss=0.0001761, whisper_loss=0.08444, over 21980.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01118, ecapa_loss=0.0001837, whisper_loss=0.09167, over 3888963.06 frames. ], batch size: 91, lr: 6.19e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:30:33,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-12 03:30:36,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1436920.0, ans=0.2 2024-08-12 03:30:46,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1436920.0, ans=0.0 2024-08-12 03:31:10,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.496e+01 2.755e+01 3.152e+01 5.278e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-12 03:31:14,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-12 03:31:15,729 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:31:17,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1437120.0, ans=0.0 2024-08-12 03:31:27,454 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 03:31:36,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1437320.0, ans=0.0 2024-08-12 03:31:37,500 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13300, loss[loss=0.1276, beats_loss=0.00881, ecapa_loss=0.0001822, whisper_loss=0.117, over 22827.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001832, whisper_loss=0.09221, over 3892976.20 frames. ], batch size: 89, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:31:38,977 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 03:31:41,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=12.0 2024-08-12 03:31:45,357 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-12 03:31:57,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1437420.0, ans=0.125 2024-08-12 03:32:12,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2024-08-12 03:32:19,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1437520.0, ans=0.125 2024-08-12 03:32:32,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2024-08-12 03:32:49,604 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 03:32:50,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13350, loss[loss=0.09205, beats_loss=0.01417, ecapa_loss=0.0001845, whisper_loss=0.07603, over 22284.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01115, ecapa_loss=0.0001829, whisper_loss=0.09252, over 3925720.22 frames. ], batch size: 93, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:32:51,039 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 03:32:52,273 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 17 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-12 03:32:52,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1437820.0, ans=0.1 2024-08-12 03:32:55,715 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 03:33:00,251 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-12 03:33:01,601 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 03:33:04,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1437920.0, ans=0.125 2024-08-12 03:33:09,294 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 03:33:14,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-12 03:33:24,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1438020.0, ans=0.125 2024-08-12 03:33:34,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1438120.0, ans=0.125 2024-08-12 03:33:37,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.851e+01 3.185e+01 1.772e+02, threshold=5.702e+01, percent-clipped=1.0 2024-08-12 03:33:51,114 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 03:33:54,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1438220.0, ans=0.125 2024-08-12 03:33:56,970 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 03:34:04,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13400, loss[loss=0.1151, beats_loss=0.01242, ecapa_loss=0.0001653, whisper_loss=0.101, over 22805.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01113, ecapa_loss=0.0001831, whisper_loss=0.09234, over 3901387.04 frames. ], batch size: 88, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:34:08,682 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 03:34:19,085 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 03:34:20,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-12 03:34:53,848 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 03:35:04,878 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 03:35:09,250 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 03:35:14,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1438820.0, ans=0.0 2024-08-12 03:35:15,838 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13450, loss[loss=0.1186, beats_loss=0.009201, ecapa_loss=0.0002033, whisper_loss=0.1073, over 20977.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.0001842, whisper_loss=0.09238, over 3871818.79 frames. ], batch size: 83, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:35:20,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1438820.0, ans=0.0 2024-08-12 03:35:31,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1438920.0, ans=0.0 2024-08-12 03:35:31,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-12 03:35:36,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-12 03:35:49,095 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 03:35:50,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1439020.0, ans=0.125 2024-08-12 03:36:02,173 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.531e+01 2.871e+01 3.206e+01 5.320e+01, threshold=5.741e+01, percent-clipped=0.0 2024-08-12 03:36:13,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1439220.0, ans=0.125 2024-08-12 03:36:14,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1439220.0, ans=0.125 2024-08-12 03:36:21,871 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 03:36:26,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1439220.0, ans=0.1 2024-08-12 03:36:28,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1439320.0, ans=0.125 2024-08-12 03:36:29,190 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13500, loss[loss=0.07625, beats_loss=0.01235, ecapa_loss=0.0002289, whisper_loss=0.06161, over 17563.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01102, ecapa_loss=0.0001854, whisper_loss=0.09278, over 3877291.54 frames. ], batch size: 74, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:37:00,622 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 03:37:14,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1439620.0, ans=0.0 2024-08-12 03:37:31,765 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 03:37:36,111 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 03:37:41,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13550, loss[loss=0.1152, beats_loss=0.008648, ecapa_loss=0.0001624, whisper_loss=0.1049, over 19125.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01109, ecapa_loss=0.0001846, whisper_loss=0.09261, over 3910375.52 frames. ], batch size: 73, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:37:48,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1439820.0, ans=0.125 2024-08-12 03:37:57,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-08-12 03:38:05,092 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-144000.pt 2024-08-12 03:38:21,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.20 vs. limit=10.0 2024-08-12 03:38:22,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1440020.0, ans=0.125 2024-08-12 03:38:27,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1440120.0, ans=0.0 2024-08-12 03:38:28,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.568e+01 2.866e+01 3.422e+01 5.610e+01, threshold=5.733e+01, percent-clipped=0.0 2024-08-12 03:38:48,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1440220.0, ans=0.2 2024-08-12 03:38:53,353 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13600, loss[loss=0.08433, beats_loss=0.01188, ecapa_loss=0.0002106, whisper_loss=0.07034, over 21443.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01105, ecapa_loss=0.0001842, whisper_loss=0.09292, over 3921134.05 frames. ], batch size: 95, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:38:58,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1440320.0, ans=0.0 2024-08-12 03:38:59,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1440320.0, ans=0.125 2024-08-12 03:39:08,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1440420.0, ans=0.125 2024-08-12 03:39:12,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1440420.0, ans=0.1 2024-08-12 03:39:14,653 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 03:39:26,789 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 03:39:31,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-12 03:39:44,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1440620.0, ans=0.0 2024-08-12 03:39:45,651 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-12 03:40:01,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1440720.0, ans=0.0 2024-08-12 03:40:05,135 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13650, loss[loss=0.09398, beats_loss=0.01063, ecapa_loss=0.0001566, whisper_loss=0.08179, over 17523.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01103, ecapa_loss=0.000185, whisper_loss=0.09313, over 3897199.07 frames. ], batch size: 67, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:40:30,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-12 03:40:50,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.520e+01 2.826e+01 3.243e+01 5.319e+01, threshold=5.652e+01, percent-clipped=0.0 2024-08-12 03:40:57,369 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 03:40:58,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1441120.0, ans=0.1 2024-08-12 03:40:58,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1441120.0, ans=0.1 2024-08-12 03:41:02,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1441220.0, ans=0.125 2024-08-12 03:41:07,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1441220.0, ans=0.0 2024-08-12 03:41:16,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1441320.0, ans=0.125 2024-08-12 03:41:17,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13700, loss[loss=0.1274, beats_loss=0.009871, ecapa_loss=0.0001742, whisper_loss=0.1158, over 22023.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01102, ecapa_loss=0.0001852, whisper_loss=0.09367, over 3883567.81 frames. ], batch size: 84, lr: 6.18e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:41:19,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1441320.0, ans=0.125 2024-08-12 03:41:19,289 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:41:19,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-08-12 03:41:24,749 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 03:41:39,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1441420.0, ans=0.125 2024-08-12 03:41:47,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1441520.0, ans=0.125 2024-08-12 03:41:53,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1441520.0, ans=0.0 2024-08-12 03:42:15,006 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 03:42:17,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1441720.0, ans=0.125 2024-08-12 03:42:27,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13750, loss[loss=0.12, beats_loss=0.009376, ecapa_loss=0.0001886, whisper_loss=0.1087, over 20838.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.0111, ecapa_loss=0.0001839, whisper_loss=0.09337, over 3869356.86 frames. ], batch size: 81, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:42:31,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1441820.0, ans=0.0 2024-08-12 03:42:33,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1441820.0, ans=0.125 2024-08-12 03:42:56,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1442020.0, ans=0.0 2024-08-12 03:42:59,173 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 03:43:01,741 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 03:43:11,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.531e+01 2.738e+01 3.278e+01 4.185e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 03:43:17,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1442120.0, ans=0.0 2024-08-12 03:43:31,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-08-12 03:43:34,820 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 03:43:35,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1442220.0, ans=0.125 2024-08-12 03:43:37,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2024-08-12 03:43:38,547 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13800, loss[loss=0.1011, beats_loss=0.009295, ecapa_loss=0.0002127, whisper_loss=0.08971, over 15569.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01109, ecapa_loss=0.0001845, whisper_loss=0.09313, over 3876003.89 frames. ], batch size: 60, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:43:41,490 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 03:43:53,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-12 03:44:07,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=12.0 2024-08-12 03:44:11,261 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 03:44:13,095 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 03:44:14,589 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 03:44:21,519 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 03:44:32,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-08-12 03:44:40,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1442720.0, ans=0.125 2024-08-12 03:44:51,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13850, loss[loss=0.1113, beats_loss=0.01019, ecapa_loss=0.0001901, whisper_loss=0.09922, over 23314.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01101, ecapa_loss=0.0001846, whisper_loss=0.09353, over 3888433.96 frames. ], batch size: 93, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:44:56,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1442820.0, ans=0.1 2024-08-12 03:45:13,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1442920.0, ans=0.0 2024-08-12 03:45:15,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1442920.0, ans=0.0 2024-08-12 03:45:20,285 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-12 03:45:21,674 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-12 03:45:24,893 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 03:45:26,806 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-12 03:45:33,835 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 03:45:38,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.591e+01 3.040e+01 3.441e+01 5.923e+01, threshold=6.079e+01, percent-clipped=1.0 2024-08-12 03:46:04,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13900, loss[loss=0.1048, beats_loss=0.01038, ecapa_loss=0.0002155, whisper_loss=0.0923, over 18320.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01108, ecapa_loss=0.0001844, whisper_loss=0.09309, over 3895007.88 frames. ], batch size: 78, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:46:07,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1443320.0, ans=0.1 2024-08-12 03:46:10,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1443320.0, ans=0.125 2024-08-12 03:46:13,005 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 03:47:14,608 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 13950, loss[loss=0.1271, beats_loss=0.01133, ecapa_loss=0.0002217, whisper_loss=0.1135, over 22097.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.000185, whisper_loss=0.09299, over 3868216.59 frames. ], batch size: 90, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:47:31,626 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 03:47:31,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1443920.0, ans=0.125 2024-08-12 03:47:35,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1443920.0, ans=0.125 2024-08-12 03:47:37,648 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 19 from Vox, 14 fro AS 2024-08-12 03:47:41,870 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 03:47:59,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.550e+01 2.827e+01 3.293e+01 5.052e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 03:47:59,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1444120.0, ans=0.2 2024-08-12 03:48:10,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1444220.0, ans=0.2 2024-08-12 03:48:15,746 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 28 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 03:48:19,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1444220.0, ans=0.1 2024-08-12 03:48:24,503 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14000, loss[loss=0.1035, beats_loss=0.01193, ecapa_loss=0.00021, whisper_loss=0.08951, over 19440.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01103, ecapa_loss=0.0001843, whisper_loss=0.09323, over 3862147.30 frames. ], batch size: 80, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:48:41,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1444420.0, ans=0.125 2024-08-12 03:48:49,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=1444420.0, ans=0.02 2024-08-12 03:48:53,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1444520.0, ans=0.125 2024-08-12 03:49:03,581 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 03:49:16,190 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 03:49:26,521 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 03:49:34,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14050, loss[loss=0.1019, beats_loss=0.01393, ecapa_loss=0.0001395, whisper_loss=0.08659, over 22940.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01107, ecapa_loss=0.0001832, whisper_loss=0.09383, over 3894478.53 frames. ], batch size: 91, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:49:43,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1444820.0, ans=0.125 2024-08-12 03:50:02,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1445020.0, ans=0.0 2024-08-12 03:50:19,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.615e+01 2.934e+01 3.537e+01 1.110e+02, threshold=5.868e+01, percent-clipped=2.0 2024-08-12 03:50:39,461 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 25 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 03:50:44,821 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14100, loss[loss=0.1048, beats_loss=0.01286, ecapa_loss=0.0001713, whisper_loss=0.09018, over 21467.00 frames. ], tot_loss[loss=0.1068, beats_loss=0.01106, ecapa_loss=0.0001825, whisper_loss=0.09387, over 3887561.97 frames. ], batch size: 87, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:51:09,408 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 03:51:11,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1445520.0, ans=0.1 2024-08-12 03:51:24,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1445620.0, ans=0.125 2024-08-12 03:51:28,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1445620.0, ans=0.0 2024-08-12 03:51:28,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-12 03:51:46,785 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 03:51:50,792 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 03:51:53,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14150, loss[loss=0.125, beats_loss=0.009201, ecapa_loss=0.0002112, whisper_loss=0.1137, over 21226.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.01106, ecapa_loss=0.0001834, whisper_loss=0.09327, over 3855121.99 frames. ], batch size: 84, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:51:58,739 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 03:52:10,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1445920.0, ans=0.2 2024-08-12 03:52:19,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-12 03:52:30,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1446020.0, ans=0.0 2024-08-12 03:52:32,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1446120.0, ans=0.125 2024-08-12 03:52:36,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.480e+01 2.708e+01 3.118e+01 5.988e+01, threshold=5.416e+01, percent-clipped=1.0 2024-08-12 03:52:42,478 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 03:52:57,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-08-12 03:52:59,478 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 03:53:02,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14200, loss[loss=0.09857, beats_loss=0.01211, ecapa_loss=0.000206, whisper_loss=0.08441, over 20492.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01113, ecapa_loss=0.0001823, whisper_loss=0.09283, over 3869256.06 frames. ], batch size: 86, lr: 6.17e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:53:04,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1446320.0, ans=0.125 2024-08-12 03:53:28,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1446420.0, ans=0.125 2024-08-12 03:53:33,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1446520.0, ans=0.05 2024-08-12 03:53:35,638 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 03:53:52,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1446620.0, ans=0.125 2024-08-12 03:53:52,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1446620.0, ans=0.1 2024-08-12 03:54:01,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2024-08-12 03:54:12,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14250, loss[loss=0.09186, beats_loss=0.01181, ecapa_loss=0.0001639, whisper_loss=0.07841, over 21813.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01115, ecapa_loss=0.0001818, whisper_loss=0.09204, over 3881040.11 frames. ], batch size: 84, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:54:20,371 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 03:54:30,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1446920.0, ans=0.125 2024-08-12 03:54:33,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1446920.0, ans=0.125 2024-08-12 03:54:39,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1446920.0, ans=0.04949747468305833 2024-08-12 03:54:40,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1447020.0, ans=0.125 2024-08-12 03:54:54,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=22.5 2024-08-12 03:54:58,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.831e+01 3.136e+01 3.486e+01 5.154e+01, threshold=6.272e+01, percent-clipped=0.0 2024-08-12 03:55:05,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1447120.0, ans=0.125 2024-08-12 03:55:18,507 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-12 03:55:23,948 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14300, loss[loss=0.08857, beats_loss=0.007306, ecapa_loss=0.0002636, whisper_loss=0.07862, over 13607.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01124, ecapa_loss=0.00018, whisper_loss=0.09117, over 3864540.81 frames. ], batch size: 60, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:55:28,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1447320.0, ans=0.125 2024-08-12 03:55:30,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1447320.0, ans=0.1 2024-08-12 03:55:32,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1447320.0, ans=0.1 2024-08-12 03:55:48,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.55 vs. limit=10.0 2024-08-12 03:55:54,215 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 34 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 03:55:54,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2024-08-12 03:56:06,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1447620.0, ans=0.2 2024-08-12 03:56:09,660 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 03:56:23,905 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 03:56:32,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14350, loss[loss=0.1062, beats_loss=0.01002, ecapa_loss=0.0001819, whisper_loss=0.09432, over 16393.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0112, ecapa_loss=0.0001816, whisper_loss=0.09118, over 3866725.85 frames. ], batch size: 63, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:56:33,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1447820.0, ans=0.0 2024-08-12 03:56:33,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1447820.0, ans=0.1 2024-08-12 03:56:47,532 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 03:56:47,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1447920.0, ans=0.0 2024-08-12 03:56:50,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1447920.0, ans=0.1 2024-08-12 03:56:56,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1447920.0, ans=0.0 2024-08-12 03:56:56,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1447920.0, ans=0.1 2024-08-12 03:57:04,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1448020.0, ans=0.125 2024-08-12 03:57:09,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1448020.0, ans=0.125 2024-08-12 03:57:17,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.654e+01 2.989e+01 3.360e+01 6.544e+01, threshold=5.979e+01, percent-clipped=1.0 2024-08-12 03:57:26,578 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 03:57:26,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1448120.0, ans=0.125 2024-08-12 03:57:27,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1448220.0, ans=0.125 2024-08-12 03:57:32,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1448220.0, ans=0.125 2024-08-12 03:57:39,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1448220.0, ans=0.0 2024-08-12 03:57:40,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-12 03:57:43,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14400, loss[loss=0.1044, beats_loss=0.01201, ecapa_loss=0.0001487, whisper_loss=0.09095, over 18335.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01114, ecapa_loss=0.0001843, whisper_loss=0.0924, over 3885814.35 frames. ], batch size: 71, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:57:45,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1448320.0, ans=0.125 2024-08-12 03:57:58,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1448420.0, ans=0.125 2024-08-12 03:58:04,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1448420.0, ans=0.2 2024-08-12 03:58:29,828 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-12 03:58:37,106 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 03:58:37,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1448720.0, ans=0.2 2024-08-12 03:58:43,843 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 03:58:44,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1448720.0, ans=0.125 2024-08-12 03:58:45,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1448720.0, ans=0.2 2024-08-12 03:58:48,249 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 03:58:52,023 INFO [train_multi_KD3.py:1116] (0/4) Epoch 10, batch 14450, loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001355, whisper_loss=0.09265, over 17203.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01117, ecapa_loss=0.0001847, whisper_loss=0.09247, over 3908503.20 frames. ], batch size: 61, lr: 6.16e-03, grad_scale: 1.152921504606847e+18 2024-08-12 03:58:52,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1448820.0, ans=0.125 2024-08-12 03:59:03,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-12 03:59:19,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1449020.0, ans=0.125 2024-08-12 03:59:34,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.542e+01 2.850e+01 3.301e+01 1.207e+02, threshold=5.700e+01, percent-clipped=1.0 2024-08-12 03:59:52,562 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-10.pt 2024-08-12 04:00:35,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 0, loss[loss=0.1272, beats_loss=0.009193, ecapa_loss=0.0002214, whisper_loss=0.1158, over 22945.00 frames. ], tot_loss[loss=0.1272, beats_loss=0.009193, ecapa_loss=0.0002214, whisper_loss=0.1158, over 22945.00 frames. ], batch size: 92, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:00:35,394 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 04:01:15,774 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0005978, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 04:01:31,159 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on SV_voxceleb1: loss=0.004953, beats_loss=0, ecapa_loss=0.0004953, whisper_loss=0, over 939242.00 frames. 2024-08-12 04:02:18,042 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2226, 1.6376, 1.6576, 2.4211], device='cuda:0') 2024-08-12 04:03:27,038 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on AT_audioset: loss=0.02449, beats_loss=0.02449, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 04:03:27,042 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 04:03:28,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1449260.0, ans=15.0 2024-08-12 04:03:35,052 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 04:03:42,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1449260.0, ans=0.125 2024-08-12 04:03:46,349 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 04:04:01,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1449360.0, ans=0.125 2024-08-12 04:04:06,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1449360.0, ans=10.0 2024-08-12 04:04:19,282 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 04:04:24,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1449460.0, ans=0.125 2024-08-12 04:04:49,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-08-12 04:05:18,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1449660.0, ans=0.2 2024-08-12 04:05:33,364 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 50, loss[loss=0.09401, beats_loss=0.01139, ecapa_loss=0.0002384, whisper_loss=0.08024, over 21779.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01066, ecapa_loss=0.0001874, whisper_loss=0.09135, over 902057.07 frames. ], batch size: 90, lr: 5.88e-03, grad_scale: 1.152921504606847e+18 2024-08-12 04:05:34,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-08-12 04:05:39,059 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 04:06:04,785 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 04:06:39,777 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 04:06:47,871 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 04:07:07,249 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.409e-01 2024-08-12 04:07:08,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.147e+01 2.961e+01 3.212e+01 3.624e+01 5.944e+01, threshold=6.424e+01, percent-clipped=1.0 2024-08-12 04:07:30,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 100, loss[loss=0.1116, beats_loss=0.01108, ecapa_loss=0.0001753, whisper_loss=0.09878, over 22639.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01045, ecapa_loss=0.0001851, whisper_loss=0.09137, over 1543893.84 frames. ], batch size: 90, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:07:37,156 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 04:08:16,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-12 04:08:39,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1450460.0, ans=0.0 2024-08-12 04:09:05,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2024-08-12 04:09:53,638 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 04:09:55,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 150, loss[loss=0.1075, beats_loss=0.01183, ecapa_loss=0.0001602, whisper_loss=0.09407, over 19610.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01044, ecapa_loss=0.0001828, whisper_loss=0.09121, over 2016572.83 frames. ], batch size: 75, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:10:00,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1450760.0, ans=0.125 2024-08-12 04:10:19,838 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.109e-01 2024-08-12 04:10:26,952 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 04:10:36,481 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 04:10:50,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1450960.0, ans=0.1 2024-08-12 04:10:59,036 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 04:11:11,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1451060.0, ans=0.125 2024-08-12 04:11:15,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1451060.0, ans=0.2 2024-08-12 04:11:22,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-12 04:11:38,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.724e+01 3.107e+01 3.626e+01 6.235e+01, threshold=6.215e+01, percent-clipped=0.0 2024-08-12 04:11:38,305 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 04:12:04,625 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 200, loss[loss=0.107, beats_loss=0.01031, ecapa_loss=0.0002057, whisper_loss=0.0946, over 19530.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01045, ecapa_loss=0.0001836, whisper_loss=0.09219, over 2423230.01 frames. ], batch size: 80, lr: 5.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 04:12:10,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=12.0 2024-08-12 04:12:32,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1451360.0, ans=0.0 2024-08-12 04:12:49,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1451460.0, ans=0.125 2024-08-12 04:12:56,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1451460.0, ans=0.125 2024-08-12 04:13:48,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1451660.0, ans=0.1 2024-08-12 04:13:52,968 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 04:14:03,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1451760.0, ans=0.125 2024-08-12 04:14:04,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 250, loss[loss=0.1156, beats_loss=0.008495, ecapa_loss=0.000168, whisper_loss=0.1054, over 17819.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0107, ecapa_loss=0.0001825, whisper_loss=0.09134, over 2735796.25 frames. ], batch size: 64, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:14:29,156 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 04:14:48,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1451860.0, ans=0.1 2024-08-12 04:15:28,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1452060.0, ans=0.125 2024-08-12 04:15:41,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.465e+01 2.658e+01 3.015e+01 5.855e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-12 04:15:51,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1452160.0, ans=0.0 2024-08-12 04:16:01,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1452260.0, ans=0.125 2024-08-12 04:16:03,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 300, loss[loss=0.07737, beats_loss=0.01235, ecapa_loss=0.0001736, whisper_loss=0.06328, over 20254.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001808, whisper_loss=0.09022, over 2935413.23 frames. ], batch size: 82, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:16:05,675 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-12 04:16:12,660 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 04:16:32,155 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 04:16:37,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1452460.0, ans=0.125 2024-08-12 04:16:47,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1452560.0, ans=0.2 2024-08-12 04:16:50,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=12.0 2024-08-12 04:17:10,431 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-12 04:17:14,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 350, loss[loss=0.1255, beats_loss=0.007215, ecapa_loss=0.0002054, whisper_loss=0.1162, over 21166.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01086, ecapa_loss=0.0001815, whisper_loss=0.09056, over 3095876.02 frames. ], batch size: 82, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:17:18,749 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 04:17:24,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1452760.0, ans=0.125 2024-08-12 04:17:31,698 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 04:17:33,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1452860.0, ans=0.125 2024-08-12 04:17:41,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.74 vs. limit=22.5 2024-08-12 04:17:45,349 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-12 04:18:01,194 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 04:18:15,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2024-08-12 04:18:15,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.542e+01 2.799e+01 3.205e+01 6.505e+01, threshold=5.597e+01, percent-clipped=2.0 2024-08-12 04:18:28,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 400, loss[loss=0.1059, beats_loss=0.01014, ecapa_loss=0.0001723, whisper_loss=0.09408, over 21834.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001816, whisper_loss=0.09164, over 3285224.58 frames. ], batch size: 89, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:18:28,677 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 04:18:35,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2024-08-12 04:18:36,120 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 04:18:37,594 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 04:18:52,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1453360.0, ans=0.2 2024-08-12 04:19:05,144 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-12 04:19:09,069 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 04:19:10,979 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 04:19:12,254 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 04:19:17,741 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 18 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 04:19:19,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1453560.0, ans=0.125 2024-08-12 04:19:40,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 450, loss[loss=0.1099, beats_loss=0.011, ecapa_loss=0.0001972, whisper_loss=0.09694, over 17017.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001824, whisper_loss=0.09114, over 3374289.41 frames. ], batch size: 70, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:19:42,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1453760.0, ans=0.05 2024-08-12 04:19:44,304 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 04:19:44,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-12 04:19:47,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1453760.0, ans=0.0 2024-08-12 04:19:51,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1453760.0, ans=0.2 2024-08-12 04:20:41,369 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.543e+01 2.883e+01 3.316e+01 4.776e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-12 04:20:54,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 500, loss[loss=0.1012, beats_loss=0.009667, ecapa_loss=0.0002645, whisper_loss=0.08887, over 21336.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001818, whisper_loss=0.091, over 3496283.42 frames. ], batch size: 91, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:21:00,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1454260.0, ans=0.0 2024-08-12 04:21:03,203 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 04:21:14,732 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 04:21:22,862 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 04:21:24,317 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 26 from LS+wenet, 14 from Vox, 15 fro AS 2024-08-12 04:21:34,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1454460.0, ans=0.0 2024-08-12 04:21:58,771 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 04:22:00,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1454660.0, ans=0.125 2024-08-12 04:22:05,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1454660.0, ans=0.125 2024-08-12 04:22:06,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2024-08-12 04:22:09,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 550, loss[loss=0.1118, beats_loss=0.009465, ecapa_loss=0.0001868, whisper_loss=0.1005, over 16853.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.0001795, whisper_loss=0.09144, over 3590047.88 frames. ], batch size: 67, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:22:09,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2024-08-12 04:22:17,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1454760.0, ans=0.0 2024-08-12 04:22:23,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1454860.0, ans=0.2 2024-08-12 04:22:25,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1454860.0, ans=0.0 2024-08-12 04:22:29,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1454860.0, ans=0.125 2024-08-12 04:22:32,370 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 04:22:43,909 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 12 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 04:22:45,234 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 04:22:58,473 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:23:04,328 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 04:23:08,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.603e+01 2.842e+01 3.155e+01 5.740e+01, threshold=5.685e+01, percent-clipped=0.0 2024-08-12 04:23:08,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1455160.0, ans=0.125 2024-08-12 04:23:21,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 600, loss[loss=0.1064, beats_loss=0.01161, ecapa_loss=0.0001604, whisper_loss=0.09314, over 19106.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01087, ecapa_loss=0.0001784, whisper_loss=0.09204, over 3635043.41 frames. ], batch size: 73, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:23:26,676 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-12 04:23:37,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1455360.0, ans=0.125 2024-08-12 04:23:38,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1455360.0, ans=0.1 2024-08-12 04:23:41,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1455360.0, ans=0.0 2024-08-12 04:23:51,495 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 04:24:03,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1455460.0, ans=0.125 2024-08-12 04:24:05,035 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 04:24:09,420 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 04:24:35,340 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 650, loss[loss=0.1088, beats_loss=0.01033, ecapa_loss=0.000172, whisper_loss=0.09673, over 21036.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01091, ecapa_loss=0.0001775, whisper_loss=0.09092, over 3692307.51 frames. ], batch size: 83, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:24:40,156 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 04:24:53,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1455860.0, ans=0.125 2024-08-12 04:24:53,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1455860.0, ans=0.1 2024-08-12 04:24:56,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1455860.0, ans=0.125 2024-08-12 04:25:00,261 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 04:25:12,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1455960.0, ans=0.5 2024-08-12 04:25:12,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-08-12 04:25:26,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1456060.0, ans=0.0 2024-08-12 04:25:35,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.475e+01 2.766e+01 3.282e+01 4.630e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 04:25:35,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1456160.0, ans=0.2 2024-08-12 04:25:36,862 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 04:25:38,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1456160.0, ans=0.0 2024-08-12 04:25:42,946 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 04:25:48,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 700, loss[loss=0.09746, beats_loss=0.01204, ecapa_loss=0.0001555, whisper_loss=0.08387, over 18073.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01086, ecapa_loss=0.0001777, whisper_loss=0.09125, over 3721856.08 frames. ], batch size: 71, lr: 5.87e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:26:03,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1456360.0, ans=0.0 2024-08-12 04:26:31,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1456460.0, ans=0.125 2024-08-12 04:26:41,112 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 04:26:41,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1456560.0, ans=0.0 2024-08-12 04:26:51,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1456660.0, ans=0.2 2024-08-12 04:27:07,387 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 750, loss[loss=0.119, beats_loss=0.009764, ecapa_loss=0.0001877, whisper_loss=0.1073, over 17688.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01093, ecapa_loss=0.0001768, whisper_loss=0.09093, over 3724404.06 frames. ], batch size: 69, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:27:14,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.30 vs. limit=22.5 2024-08-12 04:27:54,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1456960.0, ans=0.125 2024-08-12 04:28:02,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1457060.0, ans=0.125 2024-08-12 04:28:16,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.543e+01 2.919e+01 3.268e+01 8.785e+01, threshold=5.838e+01, percent-clipped=1.0 2024-08-12 04:28:18,477 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 04:28:23,888 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 04:28:29,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1457160.0, ans=0.0 2024-08-12 04:28:31,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1457260.0, ans=0.0 2024-08-12 04:28:32,369 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 800, loss[loss=0.1383, beats_loss=0.007771, ecapa_loss=0.0001699, whisper_loss=0.1288, over 23714.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001774, whisper_loss=0.09118, over 3726754.07 frames. ], batch size: 90, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:28:36,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1457260.0, ans=0.2 2024-08-12 04:28:39,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1457260.0, ans=0.0 2024-08-12 04:28:45,454 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 04:28:49,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1457360.0, ans=0.125 2024-08-12 04:28:57,802 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 04:29:13,331 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 04:29:28,751 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 04:29:32,974 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 04:29:43,122 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 04:29:52,599 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 850, loss[loss=0.0862, beats_loss=0.01223, ecapa_loss=0.0001544, whisper_loss=0.07242, over 16936.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01089, ecapa_loss=0.0001773, whisper_loss=0.09062, over 3748696.97 frames. ], batch size: 66, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:30:04,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1457760.0, ans=0.125 2024-08-12 04:30:06,771 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 04:30:09,532 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 04:30:31,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.59 vs. limit=6.0 2024-08-12 04:30:40,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1458060.0, ans=0.0 2024-08-12 04:30:46,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=15.0 2024-08-12 04:30:57,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.636e+01 2.987e+01 3.471e+01 7.869e+01, threshold=5.974e+01, percent-clipped=5.0 2024-08-12 04:31:02,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1458160.0, ans=0.125 2024-08-12 04:31:08,358 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 04:31:10,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 900, loss[loss=0.1212, beats_loss=0.008654, ecapa_loss=0.0001428, whisper_loss=0.1111, over 16720.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.000177, whisper_loss=0.09129, over 3747207.41 frames. ], batch size: 60, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:31:14,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1458260.0, ans=0.035 2024-08-12 04:31:24,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1458260.0, ans=0.125 2024-08-12 04:31:29,649 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 04:31:31,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1458360.0, ans=0.125 2024-08-12 04:31:35,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1458360.0, ans=0.0 2024-08-12 04:31:52,493 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 04:32:18,960 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 04:32:28,948 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 04:32:30,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1458660.0, ans=0.125 2024-08-12 04:32:32,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1458760.0, ans=0.125 2024-08-12 04:32:34,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 950, loss[loss=0.0935, beats_loss=0.01122, ecapa_loss=0.0001551, whisper_loss=0.08074, over 21996.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001768, whisper_loss=0.09159, over 3792282.89 frames. ], batch size: 90, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:32:51,286 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 04:33:44,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.023e+01 2.653e+01 2.939e+01 3.386e+01 4.997e+01, threshold=5.879e+01, percent-clipped=0.0 2024-08-12 04:33:52,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1459160.0, ans=0.2 2024-08-12 04:33:53,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1459160.0, ans=0.0 2024-08-12 04:34:00,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1000, loss[loss=0.11, beats_loss=0.008548, ecapa_loss=0.0001931, whisper_loss=0.09949, over 15046.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0108, ecapa_loss=0.0001765, whisper_loss=0.09088, over 3781691.67 frames. ], batch size: 57, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:34:10,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2024-08-12 04:34:25,023 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 04:34:57,062 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 04:35:01,900 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 04:35:06,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1459660.0, ans=0.2 2024-08-12 04:35:11,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1459660.0, ans=0.125 2024-08-12 04:35:21,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1050, loss[loss=0.1081, beats_loss=0.009056, ecapa_loss=0.0002484, whisper_loss=0.09657, over 21579.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01086, ecapa_loss=0.0001763, whisper_loss=0.09099, over 3786984.10 frames. ], batch size: 94, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:35:22,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1459760.0, ans=0.1 2024-08-12 04:35:42,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1459860.0, ans=0.0 2024-08-12 04:35:43,385 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 04:35:50,960 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 04:36:08,978 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 04:36:17,545 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 04:36:23,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1460060.0, ans=0.125 2024-08-12 04:36:24,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1460060.0, ans=0.0 2024-08-12 04:36:31,655 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 04:36:33,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+01 2.762e+01 2.974e+01 3.480e+01 4.829e+01, threshold=5.949e+01, percent-clipped=0.0 2024-08-12 04:36:44,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2024-08-12 04:36:48,819 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1100, loss[loss=0.1025, beats_loss=0.01206, ecapa_loss=0.0001236, whisper_loss=0.08921, over 23473.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01096, ecapa_loss=0.0001758, whisper_loss=0.0906, over 3828767.57 frames. ], batch size: 90, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:36:49,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1460260.0, ans=0.125 2024-08-12 04:37:08,440 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 04:37:21,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1460460.0, ans=0.1 2024-08-12 04:37:43,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1460560.0, ans=0.0 2024-08-12 04:37:56,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-08-12 04:38:06,987 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 04:38:12,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1460760.0, ans=0.0 2024-08-12 04:38:13,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1150, loss[loss=0.09508, beats_loss=0.01108, ecapa_loss=0.0001698, whisper_loss=0.08231, over 23697.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.000176, whisper_loss=0.09093, over 3833409.34 frames. ], batch size: 94, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:38:49,852 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 04:39:01,224 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 04:39:07,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1461060.0, ans=0.035 2024-08-12 04:39:12,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1461060.0, ans=0.04949747468305833 2024-08-12 04:39:17,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1461160.0, ans=0.125 2024-08-12 04:39:19,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.588e+01 2.774e+01 3.143e+01 5.777e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 04:39:26,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1461160.0, ans=0.125 2024-08-12 04:39:33,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1200, loss[loss=0.09092, beats_loss=0.0111, ecapa_loss=0.0001742, whisper_loss=0.07808, over 17222.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01102, ecapa_loss=0.0001742, whisper_loss=0.09096, over 3852665.89 frames. ], batch size: 69, lr: 5.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:39:47,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1461260.0, ans=0.125 2024-08-12 04:40:09,008 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 04:40:12,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.93 vs. limit=6.0 2024-08-12 04:40:14,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1461460.0, ans=0.0 2024-08-12 04:40:27,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1461560.0, ans=0.125 2024-08-12 04:40:32,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-12 04:40:33,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-08-12 04:40:38,581 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 04:40:38,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1461660.0, ans=0.125 2024-08-12 04:40:46,032 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 04:40:50,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1461660.0, ans=0.0 2024-08-12 04:40:55,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1461760.0, ans=0.025 2024-08-12 04:40:57,466 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1250, loss[loss=0.09424, beats_loss=0.01, ecapa_loss=0.0001827, whisper_loss=0.0824, over 16285.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01102, ecapa_loss=0.0001747, whisper_loss=0.09084, over 3804595.59 frames. ], batch size: 65, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:41:01,312 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 04:41:29,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1461860.0, ans=0.125 2024-08-12 04:41:53,629 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-12 04:42:08,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.564e+01 2.833e+01 3.209e+01 5.019e+01, threshold=5.666e+01, percent-clipped=0.0 2024-08-12 04:42:24,037 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1300, loss[loss=0.1068, beats_loss=0.008888, ecapa_loss=0.0001868, whisper_loss=0.09605, over 17389.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001754, whisper_loss=0.09152, over 3825238.16 frames. ], batch size: 68, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:42:32,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-12 04:42:36,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1462260.0, ans=0.1 2024-08-12 04:42:53,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1462360.0, ans=0.0 2024-08-12 04:43:27,410 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 04:43:41,295 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-12 04:43:46,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1350, loss[loss=0.08703, beats_loss=0.01155, ecapa_loss=0.0001894, whisper_loss=0.07359, over 19327.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01098, ecapa_loss=0.0001748, whisper_loss=0.0915, over 3824061.24 frames. ], batch size: 81, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:43:54,293 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 04:43:59,730 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 04:44:10,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1462860.0, ans=0.125 2024-08-12 04:44:20,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1462860.0, ans=0.125 2024-08-12 04:44:27,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1462960.0, ans=0.125 2024-08-12 04:44:29,090 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 04:44:50,091 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 04:44:57,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1463160.0, ans=0.125 2024-08-12 04:44:58,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.596e+01 2.848e+01 3.248e+01 6.741e+01, threshold=5.696e+01, percent-clipped=1.0 2024-08-12 04:45:01,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1463160.0, ans=0.1 2024-08-12 04:45:11,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1400, loss[loss=0.08692, beats_loss=0.01053, ecapa_loss=0.0001854, whisper_loss=0.07453, over 15026.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001742, whisper_loss=0.09137, over 3823051.30 frames. ], batch size: 58, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:45:21,911 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-12 04:45:41,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1463360.0, ans=0.125 2024-08-12 04:45:42,740 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 25 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-12 04:45:44,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1463460.0, ans=0.0 2024-08-12 04:46:12,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1463560.0, ans=0.125 2024-08-12 04:46:59,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1450, loss[loss=0.1015, beats_loss=0.01146, ecapa_loss=0.000151, whisper_loss=0.08849, over 18265.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01096, ecapa_loss=0.0001734, whisper_loss=0.09127, over 3820365.22 frames. ], batch size: 70, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:47:07,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1463760.0, ans=0.0 2024-08-12 04:47:18,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1463860.0, ans=0.125 2024-08-12 04:47:25,878 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 04:47:45,612 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:47:53,782 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 04:48:05,910 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.411e+01 2.800e+01 3.262e+01 9.547e+01, threshold=5.600e+01, percent-clipped=2.0 2024-08-12 04:48:11,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1464160.0, ans=0.1 2024-08-12 04:48:20,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1500, loss[loss=0.1093, beats_loss=0.008008, ecapa_loss=0.0002058, whisper_loss=0.09926, over 18560.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01101, ecapa_loss=0.0001734, whisper_loss=0.09044, over 3820412.55 frames. ], batch size: 74, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:48:32,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1464260.0, ans=0.2 2024-08-12 04:48:46,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1464360.0, ans=0.2 2024-08-12 04:48:46,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=1464360.0, ans=12.0 2024-08-12 04:48:55,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2024-08-12 04:48:56,855 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 04:48:57,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1464460.0, ans=0.2 2024-08-12 04:49:04,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1464460.0, ans=0.2 2024-08-12 04:49:07,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1464560.0, ans=0.125 2024-08-12 04:49:13,446 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 04:49:24,830 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 04:49:34,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1464660.0, ans=0.2 2024-08-12 04:49:40,886 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1550, loss[loss=0.09372, beats_loss=0.01094, ecapa_loss=0.0001994, whisper_loss=0.08079, over 18281.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01103, ecapa_loss=0.0001733, whisper_loss=0.09014, over 3801948.93 frames. ], batch size: 75, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:49:46,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1464760.0, ans=0.125 2024-08-12 04:50:16,113 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 04:50:24,294 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.16 vs. limit=22.5 2024-08-12 04:50:42,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1465160.0, ans=0.125 2024-08-12 04:50:45,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.381e+01 2.640e+01 3.042e+01 4.916e+01, threshold=5.281e+01, percent-clipped=0.0 2024-08-12 04:50:59,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1600, loss[loss=0.07222, beats_loss=0.01393, ecapa_loss=0.0001397, whisper_loss=0.05689, over 17816.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01102, ecapa_loss=0.0001723, whisper_loss=0.09067, over 3832940.82 frames. ], batch size: 72, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:51:01,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1465260.0, ans=0.1 2024-08-12 04:51:08,641 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 04:51:20,826 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 04:51:31,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=22.5 2024-08-12 04:51:32,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1465460.0, ans=0.125 2024-08-12 04:51:54,584 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 04:51:56,105 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 11 from Vox, 49 fro AS 2024-08-12 04:52:11,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1465660.0, ans=0.035 2024-08-12 04:52:16,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1650, loss[loss=0.08816, beats_loss=0.01086, ecapa_loss=0.0001935, whisper_loss=0.07537, over 15174.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01105, ecapa_loss=0.0001732, whisper_loss=0.09135, over 3869987.41 frames. ], batch size: 62, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:52:30,348 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 04:52:30,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1465860.0, ans=0.0 2024-08-12 04:52:30,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1465860.0, ans=0.1 2024-08-12 04:52:38,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-12 04:52:49,987 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 04:52:55,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1465960.0, ans=0.1 2024-08-12 04:53:19,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.459e+01 2.653e+01 3.242e+01 4.506e+01, threshold=5.307e+01, percent-clipped=0.0 2024-08-12 04:53:23,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-12 04:53:33,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1700, loss[loss=0.115, beats_loss=0.01016, ecapa_loss=0.0001635, whisper_loss=0.1032, over 19036.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01103, ecapa_loss=0.0001725, whisper_loss=0.09124, over 3836559.29 frames. ], batch size: 75, lr: 5.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:54:01,518 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 04:54:29,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2024-08-12 04:54:32,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1466560.0, ans=0.0 2024-08-12 04:54:43,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1466660.0, ans=0.125 2024-08-12 04:54:50,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1750, loss[loss=0.1155, beats_loss=0.009543, ecapa_loss=0.0001802, whisper_loss=0.1042, over 21433.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.0001731, whisper_loss=0.09117, over 3839203.52 frames. ], batch size: 85, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:55:07,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1466860.0, ans=0.035 2024-08-12 04:55:13,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-12 04:55:30,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1466960.0, ans=0.025 2024-08-12 04:55:30,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1466960.0, ans=0.125 2024-08-12 04:55:35,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1467060.0, ans=0.2 2024-08-12 04:55:51,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1467160.0, ans=0.1 2024-08-12 04:55:53,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.424e+01 2.723e+01 3.040e+01 5.517e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-12 04:56:07,634 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1800, loss[loss=0.1182, beats_loss=0.00879, ecapa_loss=0.0001815, whisper_loss=0.1076, over 19220.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0109, ecapa_loss=0.0001742, whisper_loss=0.09077, over 3823659.04 frames. ], batch size: 76, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:56:24,832 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 04:56:26,335 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 04:56:51,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1467560.0, ans=0.125 2024-08-12 04:57:04,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1467560.0, ans=0.07 2024-08-12 04:57:08,387 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 04:57:20,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1467660.0, ans=0.0 2024-08-12 04:57:24,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1850, loss[loss=0.1307, beats_loss=0.01166, ecapa_loss=0.0001543, whisper_loss=0.1175, over 23337.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01091, ecapa_loss=0.0001735, whisper_loss=0.09091, over 3837225.89 frames. ], batch size: 91, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:58:08,905 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 04:58:11,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1468060.0, ans=0.0 2024-08-12 04:58:14,564 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 33 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 04:58:20,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-12 04:58:27,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.532e+01 2.817e+01 3.253e+01 1.073e+02, threshold=5.635e+01, percent-clipped=1.0 2024-08-12 04:58:29,142 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 04:58:29,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1468160.0, ans=0.1 2024-08-12 04:58:34,682 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 04:58:37,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.12 vs. limit=10.0 2024-08-12 04:58:38,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1468160.0, ans=0.1 2024-08-12 04:58:41,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1900, loss[loss=0.1102, beats_loss=0.009936, ecapa_loss=0.0002222, whisper_loss=0.09805, over 17538.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001749, whisper_loss=0.091, over 3826231.12 frames. ], batch size: 73, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 04:58:53,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1468260.0, ans=0.1 2024-08-12 04:58:53,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1468260.0, ans=0.125 2024-08-12 04:59:01,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1468360.0, ans=0.2 2024-08-12 04:59:05,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1468360.0, ans=0.0 2024-08-12 04:59:10,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1468360.0, ans=0.1 2024-08-12 04:59:23,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-12 04:59:30,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2024-08-12 04:59:55,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1468660.0, ans=0.0 2024-08-12 04:59:59,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 1950, loss[loss=0.09033, beats_loss=0.01198, ecapa_loss=0.0001754, whisper_loss=0.0766, over 18970.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01099, ecapa_loss=0.0001755, whisper_loss=0.09051, over 3820122.93 frames. ], batch size: 73, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:00:06,866 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 05:00:12,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1468760.0, ans=0.0 2024-08-12 05:00:33,312 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 05:00:38,535 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:00:41,225 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 05:00:44,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1469060.0, ans=0.0 2024-08-12 05:00:53,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.64 vs. limit=10.0 2024-08-12 05:01:01,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.456e+01 2.694e+01 2.989e+01 6.245e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-12 05:01:02,031 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 05:01:15,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2000, loss[loss=0.09893, beats_loss=0.01429, ecapa_loss=0.0001464, whisper_loss=0.08317, over 21347.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001768, whisper_loss=0.09084, over 3841074.05 frames. ], batch size: 86, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:01:25,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1469260.0, ans=0.125 2024-08-12 05:01:30,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1469360.0, ans=0.95 2024-08-12 05:01:35,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1469360.0, ans=0.0 2024-08-12 05:01:39,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1469360.0, ans=0.0 2024-08-12 05:01:49,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1469460.0, ans=0.125 2024-08-12 05:02:12,969 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 05:02:26,745 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 05:02:28,369 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 05:02:28,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1469660.0, ans=0.125 2024-08-12 05:02:34,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2050, loss[loss=0.09237, beats_loss=0.01258, ecapa_loss=0.0001571, whisper_loss=0.07821, over 21931.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.011, ecapa_loss=0.0001756, whisper_loss=0.09056, over 3822630.33 frames. ], batch size: 90, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:03:14,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1469960.0, ans=0.0 2024-08-12 05:03:24,826 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 05:03:27,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1470060.0, ans=0.1 2024-08-12 05:03:37,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.542e+01 2.738e+01 3.129e+01 4.867e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-12 05:03:48,069 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 05:03:50,764 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2100, loss[loss=0.107, beats_loss=0.01141, ecapa_loss=0.0001773, whisper_loss=0.09382, over 22470.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01106, ecapa_loss=0.0001761, whisper_loss=0.09085, over 3843058.50 frames. ], batch size: 88, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:04:07,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1470360.0, ans=0.2 2024-08-12 05:04:09,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1470360.0, ans=0.0 2024-08-12 05:04:09,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1470360.0, ans=0.2 2024-08-12 05:04:15,004 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 32 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 05:04:45,750 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 05:04:52,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1470660.0, ans=0.1 2024-08-12 05:04:52,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1470660.0, ans=0.125 2024-08-12 05:04:53,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1470660.0, ans=0.125 2024-08-12 05:05:03,436 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 24 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 05:05:07,969 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2150, loss[loss=0.1147, beats_loss=0.009191, ecapa_loss=0.0001649, whisper_loss=0.1038, over 15412.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01105, ecapa_loss=0.0001765, whisper_loss=0.09133, over 3822237.87 frames. ], batch size: 59, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:05:13,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-12 05:05:16,116 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 05:05:19,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1470760.0, ans=0.125 2024-08-12 05:05:33,889 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 05:05:40,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1470960.0, ans=0.0 2024-08-12 05:05:41,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1470960.0, ans=0.0 2024-08-12 05:05:41,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1470960.0, ans=0.125 2024-08-12 05:05:44,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1470960.0, ans=0.125 2024-08-12 05:05:50,479 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 05:06:09,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.509e+01 2.893e+01 3.375e+01 5.887e+01, threshold=5.785e+01, percent-clipped=2.0 2024-08-12 05:06:19,604 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-12 05:06:23,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2200, loss[loss=0.1239, beats_loss=0.01118, ecapa_loss=0.0001709, whisper_loss=0.1111, over 17982.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01103, ecapa_loss=0.0001774, whisper_loss=0.09255, over 3828618.76 frames. ], batch size: 71, lr: 5.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 05:06:33,816 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-12 05:06:46,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1471360.0, ans=0.125 2024-08-12 05:06:54,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1471460.0, ans=0.125 2024-08-12 05:07:01,956 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-12 05:07:21,837 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 05:07:32,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1471660.0, ans=0.125 2024-08-12 05:07:41,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2250, loss[loss=0.1133, beats_loss=0.01302, ecapa_loss=0.0001739, whisper_loss=0.0985, over 23395.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01108, ecapa_loss=0.0001785, whisper_loss=0.09265, over 3826197.31 frames. ], batch size: 92, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:07:55,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2024-08-12 05:08:05,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1471860.0, ans=0.0 2024-08-12 05:08:22,745 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.463e+05 2024-08-12 05:08:24,238 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:08:35,847 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 05:08:37,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1472060.0, ans=0.1 2024-08-12 05:08:41,340 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 05:08:54,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.613e+01 2.941e+01 3.406e+01 8.387e+01, threshold=5.883e+01, percent-clipped=3.0 2024-08-12 05:09:09,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2024-08-12 05:09:11,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2300, loss[loss=0.09761, beats_loss=0.01199, ecapa_loss=0.000177, whisper_loss=0.08384, over 14794.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.0111, ecapa_loss=0.000178, whisper_loss=0.09285, over 3857686.78 frames. ], batch size: 59, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:09:19,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-12 05:09:40,376 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 05:10:03,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=22.5 2024-08-12 05:10:42,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1472660.0, ans=6.0 2024-08-12 05:10:46,895 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2350, loss[loss=0.1299, beats_loss=0.008125, ecapa_loss=0.0001728, whisper_loss=0.12, over 16935.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001788, whisper_loss=0.09298, over 3847576.62 frames. ], batch size: 63, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:10:54,433 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 05:11:11,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1472860.0, ans=0.0 2024-08-12 05:11:21,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-08-12 05:11:55,749 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-12 05:12:00,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-12 05:12:03,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1473060.0, ans=0.125 2024-08-12 05:12:10,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1473060.0, ans=0.0 2024-08-12 05:12:12,937 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 05:12:18,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.614e+01 3.008e+01 3.445e+01 5.971e+01, threshold=6.017e+01, percent-clipped=1.0 2024-08-12 05:12:24,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1473160.0, ans=0.0 2024-08-12 05:12:26,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1473160.0, ans=0.2 2024-08-12 05:12:30,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1473160.0, ans=0.125 2024-08-12 05:12:37,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2400, loss[loss=0.1093, beats_loss=0.0101, ecapa_loss=0.000187, whisper_loss=0.09733, over 22848.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01097, ecapa_loss=0.0001793, whisper_loss=0.0933, over 3878087.36 frames. ], batch size: 91, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:12:41,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1473260.0, ans=0.2 2024-08-12 05:12:53,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1473260.0, ans=0.125 2024-08-12 05:13:08,436 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 05:13:13,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1473360.0, ans=0.1 2024-08-12 05:13:25,057 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 05:13:25,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1473460.0, ans=0.125 2024-08-12 05:13:26,496 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 05:13:51,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1473560.0, ans=0.125 2024-08-12 05:14:17,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1473660.0, ans=0.1 2024-08-12 05:14:20,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2450, loss[loss=0.09287, beats_loss=0.01255, ecapa_loss=0.0001466, whisper_loss=0.07885, over 17726.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01096, ecapa_loss=0.0001794, whisper_loss=0.09301, over 3872122.85 frames. ], batch size: 70, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:14:23,027 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 05:14:46,839 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-12 05:14:54,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1473860.0, ans=0.0 2024-08-12 05:14:59,487 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 05:15:07,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1473960.0, ans=0.2 2024-08-12 05:15:17,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-12 05:15:38,197 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.575e+01 2.893e+01 3.388e+01 4.265e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 05:15:47,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1474160.0, ans=0.0 2024-08-12 05:15:50,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1474260.0, ans=0.125 2024-08-12 05:15:51,425 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2500, loss[loss=0.09802, beats_loss=0.009743, ecapa_loss=0.0002011, whisper_loss=0.08626, over 20372.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.0109, ecapa_loss=0.0001794, whisper_loss=0.09321, over 3867391.73 frames. ], batch size: 84, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:16:03,928 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 05:16:09,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1474360.0, ans=0.1 2024-08-12 05:16:15,560 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 10 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-12 05:16:34,694 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 05:16:47,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1474660.0, ans=0.07 2024-08-12 05:16:54,831 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2550, loss[loss=0.108, beats_loss=0.01076, ecapa_loss=0.0001474, whisper_loss=0.09576, over 23188.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01097, ecapa_loss=0.0001792, whisper_loss=0.09301, over 3877230.45 frames. ], batch size: 89, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:16:56,143 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 05:17:09,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1474860.0, ans=0.1 2024-08-12 05:17:12,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1474860.0, ans=0.05 2024-08-12 05:17:36,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1475060.0, ans=0.125 2024-08-12 05:17:41,728 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 05:17:45,461 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 05:17:45,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1475160.0, ans=0.125 2024-08-12 05:17:46,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2024-08-12 05:17:47,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+01 2.613e+01 2.908e+01 3.447e+01 1.061e+02, threshold=5.817e+01, percent-clipped=1.0 2024-08-12 05:17:59,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2600, loss[loss=0.09788, beats_loss=0.007821, ecapa_loss=0.0002087, whisper_loss=0.08797, over 18446.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001794, whisper_loss=0.09257, over 3867194.23 frames. ], batch size: 75, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:18:15,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1475360.0, ans=0.1 2024-08-12 05:18:16,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1475360.0, ans=0.0 2024-08-12 05:18:17,174 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:18:43,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.89 vs. limit=22.5 2024-08-12 05:18:55,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1475660.0, ans=0.5 2024-08-12 05:18:56,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1475660.0, ans=0.2 2024-08-12 05:19:01,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1475660.0, ans=0.1 2024-08-12 05:19:03,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2650, loss[loss=0.09812, beats_loss=0.009425, ecapa_loss=0.0002129, whisper_loss=0.08656, over 23230.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01101, ecapa_loss=0.0001794, whisper_loss=0.09166, over 3864567.07 frames. ], batch size: 92, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:19:06,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1475760.0, ans=0.125 2024-08-12 05:19:06,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1475760.0, ans=0.125 2024-08-12 05:19:16,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1475860.0, ans=0.125 2024-08-12 05:19:19,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1475860.0, ans=0.1 2024-08-12 05:19:20,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1475860.0, ans=0.0 2024-08-12 05:19:29,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=12.0 2024-08-12 05:19:29,829 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 05:19:37,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1475960.0, ans=0.1 2024-08-12 05:19:50,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1476060.0, ans=0.125 2024-08-12 05:19:53,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1476060.0, ans=0.125 2024-08-12 05:19:54,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1476160.0, ans=0.0 2024-08-12 05:19:56,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.506e+01 2.786e+01 3.189e+01 5.235e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-12 05:20:08,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2700, loss[loss=0.1289, beats_loss=0.01128, ecapa_loss=0.0001675, whisper_loss=0.1159, over 21282.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01108, ecapa_loss=0.0001787, whisper_loss=0.09191, over 3889502.22 frames. ], batch size: 80, lr: 5.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:20:10,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-12 05:20:48,514 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 05:20:49,751 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 05:20:55,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1476560.0, ans=0.125 2024-08-12 05:20:58,503 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-12 05:21:13,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2750, loss[loss=0.07515, beats_loss=0.01536, ecapa_loss=0.0001422, whisper_loss=0.05837, over 13899.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01109, ecapa_loss=0.0001777, whisper_loss=0.09107, over 3862835.92 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:21:14,557 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 05:21:24,741 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 05:21:50,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1477060.0, ans=0.0 2024-08-12 05:21:52,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1477060.0, ans=0.125 2024-08-12 05:21:58,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1477060.0, ans=0.125 2024-08-12 05:22:05,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.574e+01 2.886e+01 3.333e+01 4.847e+01, threshold=5.772e+01, percent-clipped=0.0 2024-08-12 05:22:08,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1477160.0, ans=0.0 2024-08-12 05:22:17,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2800, loss[loss=0.1047, beats_loss=0.009575, ecapa_loss=0.0002203, whisper_loss=0.09287, over 16671.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001781, whisper_loss=0.09158, over 3889143.11 frames. ], batch size: 68, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:22:19,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1477260.0, ans=0.125 2024-08-12 05:22:21,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1477260.0, ans=0.025 2024-08-12 05:22:25,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-12 05:22:34,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1477360.0, ans=0.125 2024-08-12 05:22:54,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1477460.0, ans=0.09899494936611666 2024-08-12 05:22:55,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1477560.0, ans=0.1 2024-08-12 05:23:02,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1477560.0, ans=0.0 2024-08-12 05:23:03,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1477560.0, ans=0.09899494936611666 2024-08-12 05:23:06,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1477560.0, ans=0.125 2024-08-12 05:23:07,700 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 05:23:10,337 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 05:23:11,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1477660.0, ans=0.0 2024-08-12 05:23:25,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2850, loss[loss=0.1058, beats_loss=0.01121, ecapa_loss=0.0001373, whisper_loss=0.09323, over 18587.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01116, ecapa_loss=0.0001774, whisper_loss=0.0912, over 3892056.36 frames. ], batch size: 71, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:23:30,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1477760.0, ans=0.0 2024-08-12 05:23:45,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1477860.0, ans=0.0 2024-08-12 05:23:45,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1477860.0, ans=0.125 2024-08-12 05:23:45,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1477860.0, ans=0.5 2024-08-12 05:23:48,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1477860.0, ans=0.0 2024-08-12 05:23:48,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.51 vs. limit=6.0 2024-08-12 05:24:08,944 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-12 05:24:14,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1478060.0, ans=0.125 2024-08-12 05:24:28,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1478160.0, ans=0.125 2024-08-12 05:24:30,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.606e+01 3.053e+01 3.517e+01 5.532e+01, threshold=6.106e+01, percent-clipped=0.0 2024-08-12 05:24:44,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2900, loss[loss=0.09772, beats_loss=0.01136, ecapa_loss=0.0001863, whisper_loss=0.08449, over 21614.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01114, ecapa_loss=0.0001788, whisper_loss=0.09165, over 3879681.80 frames. ], batch size: 89, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:24:55,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.80 vs. limit=15.0 2024-08-12 05:24:58,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1478260.0, ans=0.0 2024-08-12 05:24:59,479 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 05:25:09,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-12 05:25:24,100 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 05:25:42,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1478660.0, ans=0.0 2024-08-12 05:25:55,254 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 2950, loss[loss=0.08845, beats_loss=0.01257, ecapa_loss=0.0002124, whisper_loss=0.07375, over 21909.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01124, ecapa_loss=0.0001797, whisper_loss=0.09129, over 3902476.15 frames. ], batch size: 93, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:25:55,414 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 05:25:56,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1478760.0, ans=0.0 2024-08-12 05:26:02,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-12 05:26:04,430 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 15 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 05:26:14,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1478860.0, ans=0.125 2024-08-12 05:26:17,214 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 05:26:48,790 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+01 2.658e+01 2.945e+01 3.393e+01 5.337e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 05:26:52,696 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 30 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 05:26:54,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1479160.0, ans=0.125 2024-08-12 05:27:00,066 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3000, loss[loss=0.09446, beats_loss=0.0102, ecapa_loss=0.000218, whisper_loss=0.08208, over 14698.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01119, ecapa_loss=0.0001804, whisper_loss=0.09189, over 3927405.93 frames. ], batch size: 60, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:27:00,067 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 05:27:41,690 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on ASR_libri: loss=0.2561, beats_loss=0, ecapa_loss=0.0006006, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 05:27:58,667 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on SV_voxceleb1: loss=0.004832, beats_loss=0, ecapa_loss=0.0004832, whisper_loss=0, over 939242.00 frames. 2024-08-12 05:29:46,685 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4607, 4.7337, 5.3030, 5.4363], device='cuda:0') 2024-08-12 05:30:00,032 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on AT_audioset: loss=0.02445, beats_loss=0.02445, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 05:30:00,037 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 05:30:20,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1479360.0, ans=0.125 2024-08-12 05:30:27,024 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 05:30:28,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=12.0 2024-08-12 05:30:44,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1479560.0, ans=0.0 2024-08-12 05:30:44,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-12 05:30:45,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1479560.0, ans=0.125 2024-08-12 05:31:04,624 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3050, loss[loss=0.1159, beats_loss=0.01119, ecapa_loss=0.0001805, whisper_loss=0.103, over 22608.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01113, ecapa_loss=0.0001809, whisper_loss=0.09274, over 3939160.28 frames. ], batch size: 91, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:31:06,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1479760.0, ans=0.125 2024-08-12 05:31:14,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1479760.0, ans=0.2 2024-08-12 05:31:30,938 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 35 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 05:31:34,839 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-148000.pt 2024-08-12 05:31:49,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1480060.0, ans=0.2 2024-08-12 05:31:58,820 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 05:32:00,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.536e+01 2.925e+01 3.464e+01 9.985e+01, threshold=5.850e+01, percent-clipped=2.0 2024-08-12 05:32:03,552 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 05:32:12,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3100, loss[loss=0.09768, beats_loss=0.01345, ecapa_loss=0.0002145, whisper_loss=0.08208, over 21733.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01117, ecapa_loss=0.0001812, whisper_loss=0.09243, over 3925694.14 frames. ], batch size: 92, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:32:28,269 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-12 05:32:59,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1480560.0, ans=0.125 2024-08-12 05:33:12,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1480660.0, ans=0.125 2024-08-12 05:33:12,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1480660.0, ans=0.125 2024-08-12 05:33:17,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3150, loss[loss=0.1248, beats_loss=0.01061, ecapa_loss=0.0001719, whisper_loss=0.1125, over 23340.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01124, ecapa_loss=0.0001807, whisper_loss=0.09181, over 3924458.87 frames. ], batch size: 93, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:33:17,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1480760.0, ans=0.125 2024-08-12 05:33:25,377 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 05:33:27,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-12 05:33:44,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1480960.0, ans=0.1 2024-08-12 05:33:53,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-12 05:33:57,867 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 05:33:59,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1481060.0, ans=0.0 2024-08-12 05:34:01,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1481060.0, ans=0.5 2024-08-12 05:34:04,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1481060.0, ans=0.2 2024-08-12 05:34:06,694 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 05:34:07,901 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 05:34:10,371 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.183e+01 2.633e+01 2.990e+01 3.410e+01 4.926e+01, threshold=5.980e+01, percent-clipped=0.0 2024-08-12 05:34:15,989 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 05:34:17,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1481160.0, ans=0.125 2024-08-12 05:34:22,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3200, loss[loss=0.1139, beats_loss=0.01052, ecapa_loss=0.0001829, whisper_loss=0.1016, over 14214.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01122, ecapa_loss=0.0001799, whisper_loss=0.0923, over 3904356.39 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:34:35,560 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 05:34:40,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1481360.0, ans=0.1 2024-08-12 05:34:51,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1481460.0, ans=0.125 2024-08-12 05:34:59,085 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 05:35:27,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3250, loss[loss=0.1154, beats_loss=0.008177, ecapa_loss=0.0002149, whisper_loss=0.1051, over 16357.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01121, ecapa_loss=0.0001817, whisper_loss=0.092, over 3922959.84 frames. ], batch size: 64, lr: 5.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:35:31,483 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-12 05:35:44,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1481860.0, ans=0.125 2024-08-12 05:35:45,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1481860.0, ans=0.0 2024-08-12 05:35:58,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1481960.0, ans=0.2 2024-08-12 05:36:06,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-08-12 05:36:21,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.550e+01 2.874e+01 3.283e+01 4.994e+01, threshold=5.748e+01, percent-clipped=0.0 2024-08-12 05:36:33,121 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3300, loss[loss=0.1029, beats_loss=0.0116, ecapa_loss=0.0001242, whisper_loss=0.09009, over 14710.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01126, ecapa_loss=0.0001803, whisper_loss=0.09126, over 3910327.22 frames. ], batch size: 54, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:36:33,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=22.5 2024-08-12 05:36:44,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1482260.0, ans=0.2 2024-08-12 05:36:47,536 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 05:37:10,758 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 05:37:11,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.83 vs. limit=15.0 2024-08-12 05:37:37,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3350, loss[loss=0.08738, beats_loss=0.009129, ecapa_loss=0.0002164, whisper_loss=0.07609, over 16986.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01118, ecapa_loss=0.0001811, whisper_loss=0.09183, over 3906157.85 frames. ], batch size: 70, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:37:43,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1482760.0, ans=0.125 2024-08-12 05:38:00,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1482860.0, ans=0.1 2024-08-12 05:38:25,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.72 vs. limit=10.0 2024-08-12 05:38:30,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.536e+01 3.034e+01 3.396e+01 1.773e+02, threshold=6.068e+01, percent-clipped=2.0 2024-08-12 05:38:32,241 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:38:33,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1483160.0, ans=0.025 2024-08-12 05:38:36,250 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:38:42,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3400, loss[loss=0.1173, beats_loss=0.01139, ecapa_loss=0.000243, whisper_loss=0.1035, over 20982.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01124, ecapa_loss=0.0001816, whisper_loss=0.09134, over 3925753.46 frames. ], batch size: 88, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:38:42,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1483260.0, ans=0.0 2024-08-12 05:38:43,901 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 05:38:44,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1483260.0, ans=0.0 2024-08-12 05:38:47,878 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 05:38:52,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1483260.0, ans=0.125 2024-08-12 05:39:00,842 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 05:39:17,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1483460.0, ans=0.05 2024-08-12 05:39:18,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1483460.0, ans=0.0 2024-08-12 05:39:24,066 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 05:39:35,962 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 05:39:38,845 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 05:39:41,393 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 05:39:46,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1483660.0, ans=0.125 2024-08-12 05:39:49,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3450, loss[loss=0.1096, beats_loss=0.01065, ecapa_loss=0.0001917, whisper_loss=0.09708, over 14824.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01125, ecapa_loss=0.0001809, whisper_loss=0.09085, over 3903811.68 frames. ], batch size: 59, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:39:52,881 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 05:39:58,014 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 05:39:59,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2024-08-12 05:40:02,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-08-12 05:40:06,074 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 05:40:12,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1483860.0, ans=0.125 2024-08-12 05:40:15,939 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 05:40:46,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.618e+01 3.064e+01 3.498e+01 5.812e+01, threshold=6.129e+01, percent-clipped=0.0 2024-08-12 05:40:59,836 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3500, loss[loss=0.1042, beats_loss=0.01274, ecapa_loss=0.0001842, whisper_loss=0.08961, over 22240.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01119, ecapa_loss=0.000182, whisper_loss=0.09093, over 3903571.90 frames. ], batch size: 91, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:41:01,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1484260.0, ans=0.025 2024-08-12 05:41:04,735 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 05:41:09,001 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:41:20,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1484360.0, ans=0.125 2024-08-12 05:41:24,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1484360.0, ans=0.125 2024-08-12 05:41:29,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1484460.0, ans=0.125 2024-08-12 05:41:39,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1484460.0, ans=0.125 2024-08-12 05:41:50,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1484560.0, ans=0.125 2024-08-12 05:41:50,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1484560.0, ans=0.125 2024-08-12 05:41:52,941 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 05:42:03,409 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-12 05:42:10,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3550, loss[loss=0.1165, beats_loss=0.01023, ecapa_loss=0.0001545, whisper_loss=0.1048, over 22367.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01111, ecapa_loss=0.0001811, whisper_loss=0.09161, over 3914430.32 frames. ], batch size: 88, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:42:13,489 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 05:42:16,434 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 05:42:20,393 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 05:42:44,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1484960.0, ans=0.0 2024-08-12 05:42:49,994 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 05:43:09,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.668e+01 2.975e+01 3.438e+01 5.088e+01, threshold=5.950e+01, percent-clipped=0.0 2024-08-12 05:43:16,140 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 05:43:20,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1485160.0, ans=0.2 2024-08-12 05:43:21,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1485260.0, ans=0.125 2024-08-12 05:43:22,613 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3600, loss[loss=0.09804, beats_loss=0.01149, ecapa_loss=0.0002262, whisper_loss=0.08428, over 16834.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0111, ecapa_loss=0.0001814, whisper_loss=0.09209, over 3916577.93 frames. ], batch size: 70, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:43:33,458 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 05:43:49,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1485360.0, ans=0.125 2024-08-12 05:44:03,131 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 05:44:19,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1485660.0, ans=0.1 2024-08-12 05:44:26,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1485660.0, ans=0.1 2024-08-12 05:44:33,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3650, loss[loss=0.1172, beats_loss=0.009965, ecapa_loss=0.0002063, whisper_loss=0.1052, over 22796.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01115, ecapa_loss=0.0001807, whisper_loss=0.0914, over 3894303.35 frames. ], batch size: 91, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:44:45,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1485760.0, ans=0.1 2024-08-12 05:44:55,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-12 05:45:00,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-12 05:45:24,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1486060.0, ans=0.2 2024-08-12 05:45:25,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1486060.0, ans=0.0 2024-08-12 05:45:28,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1486060.0, ans=0.0 2024-08-12 05:45:31,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1486160.0, ans=0.0 2024-08-12 05:45:32,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.529e+01 2.870e+01 3.231e+01 5.224e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-12 05:45:45,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3700, loss[loss=0.09891, beats_loss=0.01007, ecapa_loss=0.0001999, whisper_loss=0.08685, over 20397.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01115, ecapa_loss=0.0001811, whisper_loss=0.09182, over 3879884.28 frames. ], batch size: 83, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:45:52,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.11 vs. limit=22.5 2024-08-12 05:45:53,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1486260.0, ans=0.125 2024-08-12 05:46:02,417 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 05:46:09,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486360.0, ans=0.1 2024-08-12 05:46:23,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1486460.0, ans=0.125 2024-08-12 05:46:27,457 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 05:46:30,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1486560.0, ans=0.2 2024-08-12 05:46:32,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1486560.0, ans=0.125 2024-08-12 05:46:43,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1486660.0, ans=0.0 2024-08-12 05:46:51,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1486660.0, ans=0.0 2024-08-12 05:46:57,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3750, loss[loss=0.08769, beats_loss=0.01335, ecapa_loss=0.0001629, whisper_loss=0.07271, over 21382.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01119, ecapa_loss=0.0001803, whisper_loss=0.09178, over 3877713.88 frames. ], batch size: 88, lr: 5.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:47:01,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1486760.0, ans=0.1 2024-08-12 05:47:07,462 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 05:47:20,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1486860.0, ans=15.0 2024-08-12 05:47:24,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2024-08-12 05:47:35,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1486960.0, ans=0.0 2024-08-12 05:47:35,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.22 vs. limit=10.0 2024-08-12 05:47:36,625 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 05:47:39,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1487060.0, ans=0.0 2024-08-12 05:47:55,537 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.578e+01 2.846e+01 3.197e+01 4.164e+01, threshold=5.692e+01, percent-clipped=0.0 2024-08-12 05:48:07,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2024-08-12 05:48:09,235 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3800, loss[loss=0.09707, beats_loss=0.01154, ecapa_loss=0.000189, whisper_loss=0.08364, over 14284.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.000181, whisper_loss=0.09272, over 3891153.46 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:48:09,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1487260.0, ans=0.2 2024-08-12 05:48:10,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1487260.0, ans=0.0 2024-08-12 05:48:22,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2024-08-12 05:48:27,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1487360.0, ans=0.125 2024-08-12 05:48:35,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.56 vs. limit=15.0 2024-08-12 05:48:39,215 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 05:49:00,395 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 05:49:04,861 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-12 05:49:09,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1487660.0, ans=0.0 2024-08-12 05:49:16,380 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 05:49:22,740 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3850, loss[loss=0.1004, beats_loss=0.009145, ecapa_loss=0.0002036, whisper_loss=0.08917, over 19975.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01107, ecapa_loss=0.0001802, whisper_loss=0.09367, over 3903883.65 frames. ], batch size: 80, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:49:32,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1487760.0, ans=0.125 2024-08-12 05:49:53,272 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 05:49:53,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2024-08-12 05:50:08,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1488060.0, ans=0.125 2024-08-12 05:50:16,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1488060.0, ans=0.1 2024-08-12 05:50:22,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.543e+01 2.911e+01 3.298e+01 4.140e+01, threshold=5.821e+01, percent-clipped=0.0 2024-08-12 05:50:35,787 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3900, loss[loss=0.07866, beats_loss=0.01193, ecapa_loss=0.0001865, whisper_loss=0.06486, over 13568.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01097, ecapa_loss=0.0001808, whisper_loss=0.09454, over 3923973.67 frames. ], batch size: 56, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:51:07,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1488460.0, ans=0.1 2024-08-12 05:51:16,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1488460.0, ans=0.2 2024-08-12 05:51:30,888 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 05:51:32,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1488560.0, ans=0.1 2024-08-12 05:51:34,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1488560.0, ans=0.1 2024-08-12 05:51:44,406 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 05:51:51,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 3950, loss[loss=0.1234, beats_loss=0.009047, ecapa_loss=0.000206, whisper_loss=0.1123, over 22352.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01096, ecapa_loss=0.0001821, whisper_loss=0.09375, over 3896976.96 frames. ], batch size: 88, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:51:57,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-08-12 05:52:14,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1488860.0, ans=0.2 2024-08-12 05:52:45,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1489060.0, ans=0.0 2024-08-12 05:52:53,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.124e+01 2.639e+01 2.878e+01 3.466e+01 7.368e+01, threshold=5.755e+01, percent-clipped=1.0 2024-08-12 05:53:07,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4000, loss[loss=0.1181, beats_loss=0.01045, ecapa_loss=0.0001624, whisper_loss=0.106, over 22239.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01097, ecapa_loss=0.000182, whisper_loss=0.09414, over 3901816.71 frames. ], batch size: 86, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:53:10,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1489260.0, ans=0.1 2024-08-12 05:53:11,732 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 05:53:20,867 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-12 05:53:21,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.91 vs. limit=22.5 2024-08-12 05:53:32,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1489360.0, ans=0.125 2024-08-12 05:53:49,598 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 05:53:55,949 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 05:54:03,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1489560.0, ans=0.0 2024-08-12 05:54:04,944 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 05:54:19,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-12 05:54:20,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1489660.0, ans=0.125 2024-08-12 05:54:23,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4050, loss[loss=0.1312, beats_loss=0.009065, ecapa_loss=0.0001911, whisper_loss=0.1202, over 23455.00 frames. ], tot_loss[loss=0.1073, beats_loss=0.01092, ecapa_loss=0.0001837, whisper_loss=0.09459, over 3902896.18 frames. ], batch size: 92, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:54:23,177 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 05:54:41,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1489860.0, ans=0.125 2024-08-12 05:55:08,892 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 05:55:14,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1490060.0, ans=0.125 2024-08-12 05:55:25,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.625e+01 2.909e+01 3.364e+01 7.852e+01, threshold=5.817e+01, percent-clipped=2.0 2024-08-12 05:55:36,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1490160.0, ans=0.015 2024-08-12 05:55:39,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4100, loss[loss=0.09295, beats_loss=0.01309, ecapa_loss=0.0001865, whisper_loss=0.078, over 19474.00 frames. ], tot_loss[loss=0.1076, beats_loss=0.01093, ecapa_loss=0.0001821, whisper_loss=0.09483, over 3904917.16 frames. ], batch size: 75, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:55:40,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1490260.0, ans=0.0 2024-08-12 05:56:12,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.77 vs. limit=15.0 2024-08-12 05:56:16,913 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 05:56:28,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=12.0 2024-08-12 05:56:45,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=1490660.0, ans=0.1 2024-08-12 05:56:48,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1490660.0, ans=0.125 2024-08-12 05:56:56,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4150, loss[loss=0.1396, beats_loss=0.008922, ecapa_loss=0.0002019, whisper_loss=0.1287, over 20333.00 frames. ], tot_loss[loss=0.1074, beats_loss=0.01099, ecapa_loss=0.0001813, whisper_loss=0.09464, over 3890017.49 frames. ], batch size: 77, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:56:56,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1490760.0, ans=0.0 2024-08-12 05:57:07,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2024-08-12 05:57:15,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2024-08-12 05:57:17,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.94 vs. limit=22.5 2024-08-12 05:57:21,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1490860.0, ans=0.0 2024-08-12 05:57:22,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1490860.0, ans=0.2 2024-08-12 05:57:30,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1490960.0, ans=0.125 2024-08-12 05:57:33,173 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 05:57:33,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1490960.0, ans=0.125 2024-08-12 05:57:53,431 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 05:57:53,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1491060.0, ans=0.0 2024-08-12 05:58:01,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.619e+01 2.867e+01 3.217e+01 5.431e+01, threshold=5.734e+01, percent-clipped=0.0 2024-08-12 05:58:15,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4200, loss[loss=0.0945, beats_loss=0.01496, ecapa_loss=0.0001445, whisper_loss=0.07809, over 18048.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01117, ecapa_loss=0.0001797, whisper_loss=0.09336, over 3880990.10 frames. ], batch size: 73, lr: 5.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 05:58:20,327 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-12 05:59:07,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1491560.0, ans=0.125 2024-08-12 05:59:10,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1491560.0, ans=0.125 2024-08-12 05:59:20,401 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-12 05:59:34,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4250, loss[loss=0.1054, beats_loss=0.01228, ecapa_loss=0.0001441, whisper_loss=0.09169, over 22513.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01118, ecapa_loss=0.0001789, whisper_loss=0.09262, over 3876660.57 frames. ], batch size: 88, lr: 5.80e-03, grad_scale: 1.152921504606847e+18 2024-08-12 05:59:52,779 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 32 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 06:00:04,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1491860.0, ans=0.125 2024-08-12 06:00:09,002 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-12 06:00:12,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-12 06:00:15,124 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 06:00:18,461 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 06:00:20,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-12 06:00:28,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1492060.0, ans=0.2 2024-08-12 06:00:31,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-12 06:00:32,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1492060.0, ans=0.125 2024-08-12 06:00:37,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1492160.0, ans=0.0 2024-08-12 06:00:40,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.448e+01 2.725e+01 3.062e+01 4.978e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-12 06:00:41,012 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:00:50,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1492160.0, ans=0.2 2024-08-12 06:00:56,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4300, loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.00018, whisper_loss=0.09195, over 19727.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01113, ecapa_loss=0.0001797, whisper_loss=0.09263, over 3873502.14 frames. ], batch size: 77, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:01:34,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1492460.0, ans=0.125 2024-08-12 06:01:58,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-08-12 06:02:16,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4350, loss[loss=0.1083, beats_loss=0.01104, ecapa_loss=0.0001476, whisper_loss=0.09583, over 24179.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0111, ecapa_loss=0.0001803, whisper_loss=0.09168, over 3846676.56 frames. ], batch size: 94, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:02:20,232 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 06:02:38,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1492860.0, ans=0.0 2024-08-12 06:02:47,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1492860.0, ans=0.125 2024-08-12 06:02:48,954 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 06:02:50,332 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 06:02:53,164 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 06:03:17,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1493060.0, ans=0.125 2024-08-12 06:03:25,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.573e+01 2.985e+01 3.568e+01 9.873e+01, threshold=5.969e+01, percent-clipped=3.0 2024-08-12 06:03:40,605 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4400, loss[loss=0.09149, beats_loss=0.0104, ecapa_loss=0.0001643, whisper_loss=0.07944, over 15321.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01106, ecapa_loss=0.0001802, whisper_loss=0.09251, over 3860852.27 frames. ], batch size: 59, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:03:47,717 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 06:03:54,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1493260.0, ans=0.125 2024-08-12 06:04:00,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1493360.0, ans=0.125 2024-08-12 06:04:15,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1493460.0, ans=0.125 2024-08-12 06:04:29,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1493560.0, ans=0.2 2024-08-12 06:04:47,989 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 06:05:00,697 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 06:05:05,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4450, loss[loss=0.09059, beats_loss=0.01261, ecapa_loss=0.0001623, whisper_loss=0.07636, over 18204.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01099, ecapa_loss=0.0001804, whisper_loss=0.09293, over 3865473.56 frames. ], batch size: 73, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:05:11,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1493760.0, ans=0.0 2024-08-12 06:05:54,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1494060.0, ans=0.0 2024-08-12 06:06:10,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1494160.0, ans=0.125 2024-08-12 06:06:13,907 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.568e+01 2.752e+01 3.153e+01 4.560e+01, threshold=5.503e+01, percent-clipped=0.0 2024-08-12 06:06:16,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1494160.0, ans=0.125 2024-08-12 06:06:29,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4500, loss[loss=0.1088, beats_loss=0.009627, ecapa_loss=0.000193, whisper_loss=0.09728, over 22017.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01091, ecapa_loss=0.0001812, whisper_loss=0.09331, over 3851361.02 frames. ], batch size: 85, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:06:42,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-12 06:07:22,858 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 06:07:23,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1494560.0, ans=0.09899494936611666 2024-08-12 06:07:26,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1494560.0, ans=0.1 2024-08-12 06:07:55,680 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4550, loss[loss=0.08652, beats_loss=0.01137, ecapa_loss=0.0001815, whisper_loss=0.07333, over 18321.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01094, ecapa_loss=0.0001828, whisper_loss=0.093, over 3844953.78 frames. ], batch size: 74, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:08:43,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-12 06:09:01,669 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 06:09:05,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.505e+01 2.717e+01 3.004e+01 5.094e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-12 06:09:18,061 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 7 from Vox, 30 fro AS 2024-08-12 06:09:20,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4600, loss[loss=0.08842, beats_loss=0.01534, ecapa_loss=0.0001549, whisper_loss=0.07153, over 18221.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01101, ecapa_loss=0.0001814, whisper_loss=0.09291, over 3837302.22 frames. ], batch size: 72, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:09:28,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1495260.0, ans=0.09899494936611666 2024-08-12 06:09:28,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-12 06:09:36,092 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 06:10:09,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1495560.0, ans=0.125 2024-08-12 06:10:09,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1495560.0, ans=0.125 2024-08-12 06:10:21,491 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 06:10:44,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4650, loss[loss=0.09448, beats_loss=0.01403, ecapa_loss=0.0001526, whisper_loss=0.07892, over 22790.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01104, ecapa_loss=0.000182, whisper_loss=0.0926, over 3827467.78 frames. ], batch size: 91, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:10:52,227 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 06:10:55,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1495760.0, ans=0.125 2024-08-12 06:11:04,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1495860.0, ans=0.025 2024-08-12 06:11:07,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1495860.0, ans=0.2 2024-08-12 06:11:44,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1496060.0, ans=0.0 2024-08-12 06:11:52,664 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 06:11:54,243 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.541e+01 2.726e+01 3.242e+01 5.233e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 06:12:00,688 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-12 06:12:05,192 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.46 vs. limit=10.0 2024-08-12 06:12:06,196 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 06:12:09,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4700, loss[loss=0.1096, beats_loss=0.01006, ecapa_loss=0.0001609, whisper_loss=0.09797, over 18186.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01104, ecapa_loss=0.000182, whisper_loss=0.09275, over 3822419.81 frames. ], batch size: 67, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:12:16,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1496260.0, ans=0.1 2024-08-12 06:12:18,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1496260.0, ans=0.125 2024-08-12 06:12:20,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1496260.0, ans=0.0 2024-08-12 06:12:27,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-08-12 06:12:29,517 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 06:12:53,347 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 06:13:03,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-12 06:13:13,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1496660.0, ans=0.125 2024-08-12 06:13:30,814 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4750, loss[loss=0.0875, beats_loss=0.01292, ecapa_loss=0.0002049, whisper_loss=0.07253, over 21607.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01102, ecapa_loss=0.0001823, whisper_loss=0.09255, over 3841253.84 frames. ], batch size: 93, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:13:33,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1496760.0, ans=0.125 2024-08-12 06:13:41,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1496760.0, ans=0.2 2024-08-12 06:13:41,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-08-12 06:13:52,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1496860.0, ans=0.1 2024-08-12 06:13:55,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1496860.0, ans=0.0 2024-08-12 06:14:00,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1496860.0, ans=0.125 2024-08-12 06:14:16,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1496960.0, ans=0.1 2024-08-12 06:14:22,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1497060.0, ans=0.125 2024-08-12 06:14:23,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1497060.0, ans=0.1 2024-08-12 06:14:36,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.611e+01 2.918e+01 3.267e+01 6.538e+01, threshold=5.836e+01, percent-clipped=2.0 2024-08-12 06:14:44,636 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 06:14:51,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4800, loss[loss=0.1162, beats_loss=0.01109, ecapa_loss=0.0001886, whisper_loss=0.1032, over 22181.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01108, ecapa_loss=0.0001817, whisper_loss=0.09222, over 3854930.48 frames. ], batch size: 90, lr: 5.79e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:14:56,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1497260.0, ans=0.125 2024-08-12 06:15:01,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1497260.0, ans=0.125 2024-08-12 06:15:10,749 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 06:15:17,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1497360.0, ans=0.0 2024-08-12 06:15:56,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-12 06:16:13,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4850, loss[loss=0.09656, beats_loss=0.01054, ecapa_loss=0.0001517, whisper_loss=0.0845, over 21684.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01106, ecapa_loss=0.000183, whisper_loss=0.09263, over 3863585.71 frames. ], batch size: 84, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:16:17,707 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 06:16:26,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=12.0 2024-08-12 06:16:46,522 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 06:16:56,723 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 06:17:00,950 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 32 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 06:17:12,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1498060.0, ans=0.125 2024-08-12 06:17:19,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.592e+01 2.912e+01 3.180e+01 4.291e+01, threshold=5.823e+01, percent-clipped=0.0 2024-08-12 06:17:34,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4900, loss[loss=0.1141, beats_loss=0.01138, ecapa_loss=0.0001958, whisper_loss=0.1007, over 21871.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01113, ecapa_loss=0.0001824, whisper_loss=0.09228, over 3890720.20 frames. ], batch size: 87, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:18:04,570 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.021e+00 2024-08-12 06:18:06,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-12 06:18:22,881 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 06:18:36,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1498560.0, ans=0.2 2024-08-12 06:18:40,641 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 06:18:42,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1498660.0, ans=0.125 2024-08-12 06:18:57,130 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 4950, loss[loss=0.1254, beats_loss=0.008061, ecapa_loss=0.0001943, whisper_loss=0.1154, over 15455.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01119, ecapa_loss=0.0001808, whisper_loss=0.09173, over 3869512.18 frames. ], batch size: 59, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:19:28,531 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 06:19:54,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-12 06:20:04,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.693e+01 3.094e+01 3.524e+01 6.311e+01, threshold=6.188e+01, percent-clipped=2.0 2024-08-12 06:20:19,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5000, loss[loss=0.08944, beats_loss=0.01173, ecapa_loss=0.0002389, whisper_loss=0.07532, over 13172.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01115, ecapa_loss=0.0001819, whisper_loss=0.09242, over 3855188.01 frames. ], batch size: 54, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:20:25,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1499260.0, ans=0.125 2024-08-12 06:20:25,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-08-12 06:20:32,960 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 06:20:39,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1499360.0, ans=0.0 2024-08-12 06:20:48,902 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 06:21:08,708 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 06:21:10,158 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-12 06:21:19,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1499560.0, ans=0.125 2024-08-12 06:21:31,684 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 06:21:35,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1499660.0, ans=0.125 2024-08-12 06:21:37,636 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 06:21:41,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5050, loss[loss=0.1169, beats_loss=0.01042, ecapa_loss=0.0001771, whisper_loss=0.1047, over 22086.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01117, ecapa_loss=0.0001823, whisper_loss=0.09217, over 3872578.90 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:21:59,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1499860.0, ans=0.125 2024-08-12 06:22:01,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1499860.0, ans=0.125 2024-08-12 06:22:08,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1499860.0, ans=0.2 2024-08-12 06:22:13,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1499860.0, ans=0.2 2024-08-12 06:22:20,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1499960.0, ans=0.125 2024-08-12 06:22:22,487 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 06:22:37,016 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 06:22:42,155 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 33 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 06:22:48,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1500160.0, ans=0.1 2024-08-12 06:22:51,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.582e+01 2.858e+01 3.371e+01 2.461e+02, threshold=5.717e+01, percent-clipped=1.0 2024-08-12 06:23:00,096 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-12 06:23:05,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5100, loss[loss=0.1058, beats_loss=0.01031, ecapa_loss=0.0001884, whisper_loss=0.09362, over 22434.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01115, ecapa_loss=0.0001806, whisper_loss=0.09215, over 3875891.82 frames. ], batch size: 90, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:23:47,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1500460.0, ans=0.125 2024-08-12 06:23:52,089 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 06:23:53,451 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 06:24:07,325 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 06:24:13,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1500660.0, ans=0.0 2024-08-12 06:24:27,317 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5150, loss[loss=0.1069, beats_loss=0.01178, ecapa_loss=0.0001881, whisper_loss=0.09326, over 21969.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01115, ecapa_loss=0.0001796, whisper_loss=0.09172, over 3875289.95 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:24:33,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2024-08-12 06:24:37,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-08-12 06:24:40,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1500760.0, ans=0.0 2024-08-12 06:25:06,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1500960.0, ans=0.1 2024-08-12 06:25:56,992 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 06:26:00,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1501160.0, ans=0.1 2024-08-12 06:26:03,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.805e+01 3.216e+01 1.904e+02, threshold=5.610e+01, percent-clipped=1.0 2024-08-12 06:26:05,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1501160.0, ans=0.0 2024-08-12 06:26:20,034 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 06:26:21,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5200, loss[loss=0.1326, beats_loss=0.01054, ecapa_loss=0.0001766, whisper_loss=0.1203, over 22369.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01103, ecapa_loss=0.0001809, whisper_loss=0.09267, over 3873455.73 frames. ], batch size: 89, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:26:28,186 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 06:26:32,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-08-12 06:26:35,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1501260.0, ans=0.2 2024-08-12 06:26:44,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=12.0 2024-08-12 06:26:47,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1501360.0, ans=0.0 2024-08-12 06:27:14,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1501560.0, ans=0.125 2024-08-12 06:27:38,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2024-08-12 06:27:49,676 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5250, loss[loss=0.1013, beats_loss=0.01006, ecapa_loss=0.00017, whisper_loss=0.0895, over 14694.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01109, ecapa_loss=0.0001807, whisper_loss=0.09208, over 3858319.65 frames. ], batch size: 56, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:27:49,951 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-12 06:27:52,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-08-12 06:28:06,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-12 06:28:07,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1501860.0, ans=0.125 2024-08-12 06:28:16,878 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 06:28:19,941 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 06:28:38,034 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 06:28:41,242 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 30 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 06:28:58,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.556e+01 2.811e+01 3.138e+01 9.826e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 06:29:07,349 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 06:29:12,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1502260.0, ans=0.2 2024-08-12 06:29:13,734 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5300, loss[loss=0.09446, beats_loss=0.01117, ecapa_loss=0.0001713, whisper_loss=0.08157, over 22778.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01112, ecapa_loss=0.0001793, whisper_loss=0.09283, over 3874898.23 frames. ], batch size: 90, lr: 5.78e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:29:20,534 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 06:29:57,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1502460.0, ans=0.1 2024-08-12 06:30:02,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.37 vs. limit=12.0 2024-08-12 06:30:18,975 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 06:30:36,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5350, loss[loss=0.09627, beats_loss=0.01352, ecapa_loss=0.0001555, whisper_loss=0.08119, over 21462.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01105, ecapa_loss=0.0001801, whisper_loss=0.09296, over 3861364.33 frames. ], batch size: 89, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:30:38,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1502760.0, ans=0.125 2024-08-12 06:30:58,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1502860.0, ans=0.125 2024-08-12 06:31:06,401 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 06:31:06,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1502860.0, ans=0.125 2024-08-12 06:31:06,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1502860.0, ans=0.125 2024-08-12 06:31:08,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1502960.0, ans=0.125 2024-08-12 06:31:37,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1503060.0, ans=0.0 2024-08-12 06:31:42,435 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 06:31:43,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.466e+01 2.824e+01 3.264e+01 5.204e+01, threshold=5.648e+01, percent-clipped=0.0 2024-08-12 06:31:45,582 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 06:31:50,392 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 06:31:57,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5400, loss[loss=0.1171, beats_loss=0.009362, ecapa_loss=0.0001961, whisper_loss=0.1058, over 16744.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01096, ecapa_loss=0.000179, whisper_loss=0.0938, over 3903883.16 frames. ], batch size: 63, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:32:33,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1503460.0, ans=0.125 2024-08-12 06:32:36,531 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 06:32:37,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-08-12 06:32:39,342 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 06:32:41,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1503460.0, ans=0.09899494936611666 2024-08-12 06:32:44,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503560.0, ans=0.1 2024-08-12 06:32:52,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1503560.0, ans=0.0 2024-08-12 06:33:17,796 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5450, loss[loss=0.08968, beats_loss=0.01164, ecapa_loss=0.000161, whisper_loss=0.07643, over 20894.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01104, ecapa_loss=0.0001798, whisper_loss=0.0929, over 3899495.41 frames. ], batch size: 84, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:33:18,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-12 06:33:19,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1503760.0, ans=0.0 2024-08-12 06:33:30,398 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 06:33:32,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1503860.0, ans=0.125 2024-08-12 06:33:37,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1503860.0, ans=0.125 2024-08-12 06:33:42,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1503860.0, ans=0.0 2024-08-12 06:33:44,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1503860.0, ans=0.0 2024-08-12 06:33:49,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1503960.0, ans=0.1 2024-08-12 06:33:54,471 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 06:34:23,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.572e+01 2.892e+01 3.418e+01 4.149e+01, threshold=5.785e+01, percent-clipped=0.0 2024-08-12 06:34:28,113 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 06:34:30,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1504160.0, ans=0.125 2024-08-12 06:34:37,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5500, loss[loss=0.09676, beats_loss=0.01212, ecapa_loss=0.0001455, whisper_loss=0.08318, over 16634.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01108, ecapa_loss=0.000181, whisper_loss=0.09242, over 3866809.44 frames. ], batch size: 63, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:34:44,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1504260.0, ans=0.125 2024-08-12 06:34:51,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1504260.0, ans=0.0 2024-08-12 06:34:51,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2024-08-12 06:34:56,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1504360.0, ans=0.2 2024-08-12 06:35:00,096 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 06:35:00,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-08-12 06:35:17,401 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 06:35:19,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1504460.0, ans=0.125 2024-08-12 06:35:31,192 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 06:35:56,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5550, loss[loss=0.1073, beats_loss=0.01166, ecapa_loss=0.0001525, whisper_loss=0.09414, over 21987.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01112, ecapa_loss=0.0001801, whisper_loss=0.09208, over 3875171.28 frames. ], batch size: 87, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:35:59,905 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 06:36:01,806 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 06:36:11,032 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-12 06:36:14,158 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 06:36:16,168 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 06:36:23,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1504860.0, ans=0.125 2024-08-12 06:36:29,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1504960.0, ans=0.125 2024-08-12 06:36:34,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1504960.0, ans=0.0 2024-08-12 06:36:35,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-12 06:36:51,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1505060.0, ans=0.125 2024-08-12 06:36:54,676 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 06:37:01,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.511e+01 2.832e+01 3.131e+01 5.675e+01, threshold=5.663e+01, percent-clipped=0.0 2024-08-12 06:37:13,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1505160.0, ans=0.0 2024-08-12 06:37:13,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1505160.0, ans=0.2 2024-08-12 06:37:16,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5600, loss[loss=0.1074, beats_loss=0.0119, ecapa_loss=0.0002174, whisper_loss=0.09336, over 18724.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01113, ecapa_loss=0.0001801, whisper_loss=0.0918, over 3880593.11 frames. ], batch size: 75, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:37:16,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1505260.0, ans=0.1 2024-08-12 06:37:18,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1505260.0, ans=0.07 2024-08-12 06:37:24,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1505260.0, ans=0.125 2024-08-12 06:37:30,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1505360.0, ans=0.1 2024-08-12 06:37:32,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1505360.0, ans=0.125 2024-08-12 06:37:43,782 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 06:37:58,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1505460.0, ans=0.125 2024-08-12 06:38:02,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-08-12 06:38:08,245 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 06:38:10,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1505560.0, ans=0.125 2024-08-12 06:38:16,231 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 06:38:39,761 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5650, loss[loss=0.1052, beats_loss=0.01015, ecapa_loss=0.000177, whisper_loss=0.09328, over 14009.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01115, ecapa_loss=0.0001789, whisper_loss=0.09182, over 3896212.38 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:38:42,881 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 06:38:59,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1505860.0, ans=0.0 2024-08-12 06:39:01,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1505860.0, ans=0.0 2024-08-12 06:39:04,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.46 vs. limit=15.0 2024-08-12 06:39:10,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1505960.0, ans=0.0 2024-08-12 06:39:18,188 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-12 06:39:18,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1505960.0, ans=0.05 2024-08-12 06:39:20,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1505960.0, ans=0.0 2024-08-12 06:39:29,354 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 27 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 06:39:29,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1506060.0, ans=0.1 2024-08-12 06:39:38,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1506060.0, ans=0.0 2024-08-12 06:39:44,871 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.537e+01 2.761e+01 3.260e+01 5.240e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-12 06:39:45,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1506160.0, ans=0.125 2024-08-12 06:39:59,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5700, loss[loss=0.08718, beats_loss=0.01308, ecapa_loss=0.0001804, whisper_loss=0.07229, over 19166.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.0001809, whisper_loss=0.09159, over 3917800.00 frames. ], batch size: 78, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:40:07,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1506260.0, ans=0.1 2024-08-12 06:40:18,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1506360.0, ans=0.125 2024-08-12 06:40:20,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1506360.0, ans=0.125 2024-08-12 06:40:27,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1506360.0, ans=0.0 2024-08-12 06:40:31,666 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 06:40:47,528 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 06:40:51,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1506560.0, ans=0.05 2024-08-12 06:40:56,406 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 06:40:57,620 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 06:41:03,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1506560.0, ans=0.125 2024-08-12 06:41:12,440 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 06:41:13,934 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 06:41:15,641 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 06:41:17,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.91 vs. limit=12.0 2024-08-12 06:41:21,953 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5750, loss[loss=0.118, beats_loss=0.01314, ecapa_loss=0.0001409, whisper_loss=0.1035, over 23638.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01115, ecapa_loss=0.0001811, whisper_loss=0.09127, over 3906026.91 frames. ], batch size: 90, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:41:35,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1506760.0, ans=0.125 2024-08-12 06:41:43,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1506860.0, ans=0.0 2024-08-12 06:41:51,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1506860.0, ans=0.2 2024-08-12 06:41:52,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1506960.0, ans=0.1 2024-08-12 06:41:55,471 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-12 06:41:58,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1506960.0, ans=0.125 2024-08-12 06:42:01,643 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 06:42:06,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1506960.0, ans=0.125 2024-08-12 06:42:09,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1507060.0, ans=0.2 2024-08-12 06:42:27,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.547e+01 2.860e+01 3.182e+01 5.592e+01, threshold=5.721e+01, percent-clipped=1.0 2024-08-12 06:42:40,465 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 06:42:41,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5800, loss[loss=0.09258, beats_loss=0.01231, ecapa_loss=0.0002179, whisper_loss=0.07809, over 18121.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01117, ecapa_loss=0.0001822, whisper_loss=0.0912, over 3876139.22 frames. ], batch size: 76, lr: 5.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 06:42:45,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1507260.0, ans=0.0 2024-08-12 06:42:45,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1507260.0, ans=0.0 2024-08-12 06:43:05,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1507360.0, ans=0.0 2024-08-12 06:43:09,883 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 06:43:11,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1507460.0, ans=0.125 2024-08-12 06:43:24,274 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 06:43:25,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1507560.0, ans=0.125 2024-08-12 06:43:31,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=1507560.0, ans=10.0 2024-08-12 06:43:39,889 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-12 06:43:40,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=12.0 2024-08-12 06:43:43,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1507660.0, ans=0.0 2024-08-12 06:43:53,386 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 06:43:55,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1507760.0, ans=0.0 2024-08-12 06:43:55,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1507760.0, ans=0.125 2024-08-12 06:43:55,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5850, loss[loss=0.1125, beats_loss=0.01159, ecapa_loss=0.0001491, whisper_loss=0.09944, over 21896.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01112, ecapa_loss=0.0001811, whisper_loss=0.09176, over 3906889.80 frames. ], batch size: 83, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:43:57,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1507760.0, ans=0.125 2024-08-12 06:44:04,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1507760.0, ans=0.0 2024-08-12 06:44:22,781 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 06:44:25,590 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.141e-01 2024-08-12 06:44:30,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-12 06:44:34,474 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 06:44:40,048 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 06:44:55,700 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.480e+01 2.777e+01 3.215e+01 5.489e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 06:45:01,939 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 06:45:06,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5900, loss[loss=0.09935, beats_loss=0.01108, ecapa_loss=0.00015, whisper_loss=0.08677, over 18248.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01113, ecapa_loss=0.0001819, whisper_loss=0.09154, over 3876145.34 frames. ], batch size: 72, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:45:09,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-08-12 06:45:10,113 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 06:46:10,311 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 06:46:14,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1508660.0, ans=0.95 2024-08-12 06:46:16,892 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 5950, loss[loss=0.1081, beats_loss=0.01268, ecapa_loss=0.0001546, whisper_loss=0.09385, over 22982.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01114, ecapa_loss=0.0001815, whisper_loss=0.09166, over 3844362.17 frames. ], batch size: 91, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:46:33,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-12 06:47:02,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2024-08-12 06:47:11,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1509160.0, ans=0.125 2024-08-12 06:47:13,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1509160.0, ans=0.0 2024-08-12 06:47:15,166 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.604e+01 2.882e+01 3.325e+01 5.467e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 06:47:26,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6000, loss[loss=0.09773, beats_loss=0.01105, ecapa_loss=0.0001227, whisper_loss=0.08546, over 16327.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01113, ecapa_loss=0.000182, whisper_loss=0.09207, over 3850073.66 frames. ], batch size: 62, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:47:26,640 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 06:48:09,272 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000598, whisper_loss=0.2484, over 922467.00 frames. 2024-08-12 06:48:27,612 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on SV_voxceleb1: loss=0.004893, beats_loss=0, ecapa_loss=0.0004893, whisper_loss=0, over 939242.00 frames. 2024-08-12 06:48:37,882 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6755, 2.3685, 2.2278, 2.4981], device='cuda:0') 2024-08-12 06:49:10,188 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8939, 4.1519, 2.6289, 4.6193], device='cuda:0') 2024-08-12 06:50:30,743 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on AT_audioset: loss=0.02461, beats_loss=0.02461, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 06:50:30,747 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 06:50:32,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1509260.0, ans=0.1 2024-08-12 06:50:43,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=1509260.0, ans=15.0 2024-08-12 06:50:44,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2024-08-12 06:50:55,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1509360.0, ans=0.0 2024-08-12 06:51:01,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1509460.0, ans=0.0 2024-08-12 06:51:08,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1509460.0, ans=0.1 2024-08-12 06:51:21,020 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 06:51:30,938 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-12 06:51:32,377 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 06:51:36,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1509660.0, ans=0.125 2024-08-12 06:51:40,788 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 06:51:41,936 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6050, loss[loss=0.09847, beats_loss=0.0112, ecapa_loss=0.0001616, whisper_loss=0.08565, over 15530.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01112, ecapa_loss=0.0001818, whisper_loss=0.09266, over 3905397.47 frames. ], batch size: 61, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:51:56,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1509860.0, ans=0.125 2024-08-12 06:52:03,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1509860.0, ans=10.0 2024-08-12 06:52:07,876 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 06:52:11,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1509960.0, ans=0.125 2024-08-12 06:52:13,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1509960.0, ans=0.125 2024-08-12 06:52:24,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1510060.0, ans=0.0 2024-08-12 06:52:31,694 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 06:52:33,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1510060.0, ans=0.0 2024-08-12 06:52:36,101 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 06:52:39,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.536e+01 2.768e+01 3.094e+01 4.494e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 06:52:44,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1510160.0, ans=0.07 2024-08-12 06:52:50,880 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6100, loss[loss=0.1191, beats_loss=0.009784, ecapa_loss=0.0002019, whisper_loss=0.1073, over 19084.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01112, ecapa_loss=0.0001812, whisper_loss=0.09216, over 3925273.31 frames. ], batch size: 75, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:52:58,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-08-12 06:53:15,680 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.079e-02 2024-08-12 06:53:22,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1510460.0, ans=0.0 2024-08-12 06:53:29,603 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 06:53:39,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1510560.0, ans=0.125 2024-08-12 06:53:45,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1510660.0, ans=0.0 2024-08-12 06:54:00,096 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6150, loss[loss=0.09579, beats_loss=0.01191, ecapa_loss=0.0001923, whisper_loss=0.08196, over 19507.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01111, ecapa_loss=0.000181, whisper_loss=0.09252, over 3930080.08 frames. ], batch size: 81, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:54:06,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-12 06:54:06,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2024-08-12 06:54:31,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1510960.0, ans=0.125 2024-08-12 06:54:42,161 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 06:54:49,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.11 vs. limit=22.5 2024-08-12 06:54:57,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.648e+01 2.944e+01 3.398e+01 5.258e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 06:54:59,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1511160.0, ans=0.125 2024-08-12 06:55:04,659 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 06:55:07,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1511260.0, ans=0.1 2024-08-12 06:55:08,375 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6200, loss[loss=0.1007, beats_loss=0.01199, ecapa_loss=0.0001649, whisper_loss=0.08701, over 22531.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01112, ecapa_loss=0.0001809, whisper_loss=0.09252, over 3922938.42 frames. ], batch size: 91, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:55:12,799 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 06:55:20,991 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 06:55:25,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1511360.0, ans=0.125 2024-08-12 06:55:32,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1511360.0, ans=0.1 2024-08-12 06:55:47,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1511460.0, ans=0.125 2024-08-12 06:55:50,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1511560.0, ans=0.0 2024-08-12 06:56:02,022 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 06:56:02,221 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 06:56:03,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1511660.0, ans=0.125 2024-08-12 06:56:03,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1511660.0, ans=0.125 2024-08-12 06:56:11,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1511660.0, ans=0.125 2024-08-12 06:56:17,089 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6250, loss[loss=0.09825, beats_loss=0.0112, ecapa_loss=0.000209, whisper_loss=0.08496, over 18518.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01111, ecapa_loss=0.0001819, whisper_loss=0.09172, over 3892190.99 frames. ], batch size: 76, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:56:20,131 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 06:56:25,860 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 06:56:28,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2024-08-12 06:56:36,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1511860.0, ans=10.0 2024-08-12 06:56:58,647 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 06:57:16,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.481e+01 2.801e+01 3.369e+01 5.530e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 06:57:18,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1512160.0, ans=0.1 2024-08-12 06:57:20,918 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 06:57:28,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6300, loss[loss=0.1179, beats_loss=0.008753, ecapa_loss=0.000222, whisper_loss=0.1069, over 21746.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.000182, whisper_loss=0.0923, over 3884750.11 frames. ], batch size: 92, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:57:32,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1512260.0, ans=0.2 2024-08-12 06:57:48,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1512360.0, ans=0.0 2024-08-12 06:58:01,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1512460.0, ans=0.1 2024-08-12 06:58:26,992 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 06:58:28,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1512660.0, ans=0.125 2024-08-12 06:58:40,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6350, loss[loss=0.1054, beats_loss=0.01044, ecapa_loss=0.0002243, whisper_loss=0.09273, over 19040.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01106, ecapa_loss=0.0001824, whisper_loss=0.09211, over 3854525.54 frames. ], batch size: 79, lr: 5.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:58:44,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1512760.0, ans=0.0 2024-08-12 06:58:49,276 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 06:58:51,948 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 06:58:57,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.33 vs. limit=22.5 2024-08-12 06:58:58,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1512860.0, ans=0.2 2024-08-12 06:59:05,803 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 06:59:19,003 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 06:59:24,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1513060.0, ans=0.04949747468305833 2024-08-12 06:59:29,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2024-08-12 06:59:31,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-08-12 06:59:42,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.576e+01 2.857e+01 3.160e+01 6.267e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 06:59:47,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1513160.0, ans=0.2 2024-08-12 06:59:50,052 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 06:59:53,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6400, loss[loss=0.1053, beats_loss=0.01255, ecapa_loss=0.0002038, whisper_loss=0.09076, over 22018.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01107, ecapa_loss=0.0001824, whisper_loss=0.09238, over 3890004.91 frames. ], batch size: 93, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 06:59:57,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1513260.0, ans=0.2 2024-08-12 06:59:59,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1513260.0, ans=0.125 2024-08-12 07:00:23,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1513460.0, ans=0.1 2024-08-12 07:00:24,325 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 07:00:27,404 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 07:00:28,936 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 07:00:46,844 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 07:00:50,005 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-12 07:01:07,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6450, loss[loss=0.08048, beats_loss=0.01428, ecapa_loss=0.0001676, whisper_loss=0.06452, over 21562.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01113, ecapa_loss=0.0001809, whisper_loss=0.0918, over 3914595.82 frames. ], batch size: 92, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:01:08,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1513760.0, ans=0.125 2024-08-12 07:01:23,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-08-12 07:01:35,261 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 07:01:45,721 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 07:01:49,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1513960.0, ans=0.2 2024-08-12 07:01:49,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1513960.0, ans=0.125 2024-08-12 07:01:57,929 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 07:02:05,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1514060.0, ans=0.0 2024-08-12 07:02:10,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.620e+01 2.898e+01 3.369e+01 4.608e+01, threshold=5.796e+01, percent-clipped=0.0 2024-08-12 07:02:15,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1514160.0, ans=0.04949747468305833 2024-08-12 07:02:18,143 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-12 07:02:22,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6500, loss[loss=0.12, beats_loss=0.01185, ecapa_loss=0.0001976, whisper_loss=0.1062, over 22169.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01106, ecapa_loss=0.0001813, whisper_loss=0.09307, over 3910637.32 frames. ], batch size: 88, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:02:41,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1514360.0, ans=0.125 2024-08-12 07:02:52,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1514460.0, ans=0.2 2024-08-12 07:03:01,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1514460.0, ans=0.2 2024-08-12 07:03:06,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1514460.0, ans=0.125 2024-08-12 07:03:10,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1514560.0, ans=0.125 2024-08-12 07:03:27,062 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 07:03:31,224 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 07:03:37,748 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6550, loss[loss=0.109, beats_loss=0.009578, ecapa_loss=0.000184, whisper_loss=0.09756, over 22955.00 frames. ], tot_loss[loss=0.1063, beats_loss=0.01107, ecapa_loss=0.0001801, whisper_loss=0.09344, over 3938240.41 frames. ], batch size: 90, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:03:46,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1514760.0, ans=0.0 2024-08-12 07:03:58,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1514860.0, ans=0.0 2024-08-12 07:04:00,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1514860.0, ans=0.5 2024-08-12 07:04:23,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1514960.0, ans=0.0 2024-08-12 07:04:47,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.605e+01 2.821e+01 3.389e+01 5.277e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 07:04:58,856 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-12 07:05:02,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6600, loss[loss=0.09531, beats_loss=0.01302, ecapa_loss=0.0001522, whisper_loss=0.08077, over 22055.00 frames. ], tot_loss[loss=0.1069, beats_loss=0.01101, ecapa_loss=0.0001804, whisper_loss=0.09409, over 3960493.89 frames. ], batch size: 91, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:05:06,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2024-08-12 07:05:34,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1515360.0, ans=0.035 2024-08-12 07:05:37,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1515460.0, ans=0.1 2024-08-12 07:05:39,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1515460.0, ans=0.2 2024-08-12 07:06:13,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1515660.0, ans=0.125 2024-08-12 07:06:15,280 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.360e-02 2024-08-12 07:06:22,834 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6650, loss[loss=0.1153, beats_loss=0.01016, ecapa_loss=0.0001613, whisper_loss=0.1036, over 22392.00 frames. ], tot_loss[loss=0.1067, beats_loss=0.01103, ecapa_loss=0.0001796, whisper_loss=0.09388, over 3985584.95 frames. ], batch size: 86, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:06:25,847 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 07:06:33,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1515760.0, ans=0.1 2024-08-12 07:06:44,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1515860.0, ans=0.2 2024-08-12 07:06:51,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.87 vs. limit=10.0 2024-08-12 07:06:52,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1515860.0, ans=0.125 2024-08-12 07:07:15,437 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 07:07:21,326 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-12 07:07:21,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1516060.0, ans=0.125 2024-08-12 07:07:30,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1516060.0, ans=0.125 2024-08-12 07:07:45,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1516160.0, ans=0.1 2024-08-12 07:07:48,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.716e+01 3.038e+01 3.399e+01 5.348e+01, threshold=6.076e+01, percent-clipped=0.0 2024-08-12 07:07:49,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2024-08-12 07:07:57,596 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 07:08:06,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6700, loss[loss=0.08986, beats_loss=0.01162, ecapa_loss=0.0001489, whisper_loss=0.07675, over 16507.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001798, whisper_loss=0.0931, over 3949660.48 frames. ], batch size: 61, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:08:06,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1516260.0, ans=0.125 2024-08-12 07:08:11,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1516260.0, ans=0.05 2024-08-12 07:08:14,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1516260.0, ans=0.0 2024-08-12 07:08:20,297 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 07:08:22,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-12 07:08:35,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1516360.0, ans=0.035 2024-08-12 07:08:37,118 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.897e+05 2024-08-12 07:09:11,280 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 07:09:11,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1516560.0, ans=0.125 2024-08-12 07:09:13,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1516560.0, ans=0.125 2024-08-12 07:09:13,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1516560.0, ans=0.1 2024-08-12 07:09:28,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1516660.0, ans=0.0 2024-08-12 07:09:33,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1516660.0, ans=0.2 2024-08-12 07:09:40,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-08-12 07:09:43,910 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6750, loss[loss=0.08969, beats_loss=0.01365, ecapa_loss=0.0001856, whisper_loss=0.07419, over 22088.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01106, ecapa_loss=0.00018, whisper_loss=0.09287, over 3918036.60 frames. ], batch size: 92, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:09:54,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1516760.0, ans=0.125 2024-08-12 07:09:58,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1516760.0, ans=0.0 2024-08-12 07:10:04,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1516860.0, ans=0.0 2024-08-12 07:10:04,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1516860.0, ans=0.125 2024-08-12 07:10:32,863 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 07:10:51,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1517060.0, ans=0.2 2024-08-12 07:10:59,054 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-12 07:11:07,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.154e+01 2.593e+01 2.755e+01 3.178e+01 4.521e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-12 07:11:24,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6800, loss[loss=0.08668, beats_loss=0.01007, ecapa_loss=0.0002133, whisper_loss=0.07449, over 14864.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01107, ecapa_loss=0.0001812, whisper_loss=0.09207, over 3883145.18 frames. ], batch size: 59, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:11:36,809 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 07:11:39,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1517260.0, ans=0.125 2024-08-12 07:11:42,590 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 07:11:56,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1517360.0, ans=10.0 2024-08-12 07:11:57,968 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 07:12:06,656 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 07:12:07,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1517460.0, ans=0.2 2024-08-12 07:12:08,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1517460.0, ans=0.125 2024-08-12 07:12:09,161 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 07:12:24,535 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-12 07:12:28,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1517660.0, ans=0.1 2024-08-12 07:12:41,875 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6850, loss[loss=0.1136, beats_loss=0.01265, ecapa_loss=0.0001468, whisper_loss=0.09949, over 21094.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0111, ecapa_loss=0.0001804, whisper_loss=0.09106, over 3858932.55 frames. ], batch size: 84, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:12:57,732 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 07:13:01,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1517860.0, ans=0.125 2024-08-12 07:13:05,808 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 07:13:06,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1517860.0, ans=0.0 2024-08-12 07:13:16,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1517960.0, ans=0.1 2024-08-12 07:13:18,603 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 07:13:21,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1517960.0, ans=0.125 2024-08-12 07:13:21,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1517960.0, ans=0.125 2024-08-12 07:13:24,560 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 07:13:26,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1518060.0, ans=0.2 2024-08-12 07:13:36,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1518060.0, ans=0.125 2024-08-12 07:13:42,169 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+01 2.608e+01 2.859e+01 3.356e+01 1.905e+02, threshold=5.718e+01, percent-clipped=1.0 2024-08-12 07:13:53,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6900, loss[loss=0.1152, beats_loss=0.00838, ecapa_loss=0.0001953, whisper_loss=0.1048, over 21955.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01113, ecapa_loss=0.0001811, whisper_loss=0.09143, over 3883102.52 frames. ], batch size: 87, lr: 5.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:13:54,114 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 07:14:00,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1518260.0, ans=0.125 2024-08-12 07:14:15,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1518360.0, ans=0.0 2024-08-12 07:14:25,310 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 07:14:52,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-12 07:14:59,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-08-12 07:15:05,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 6950, loss[loss=0.1144, beats_loss=0.01064, ecapa_loss=0.0001771, whisper_loss=0.102, over 23731.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01112, ecapa_loss=0.0001814, whisper_loss=0.09185, over 3892267.91 frames. ], batch size: 91, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:15:15,551 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 07:15:18,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.38 vs. limit=10.0 2024-08-12 07:15:22,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1518860.0, ans=0.0 2024-08-12 07:15:46,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1519060.0, ans=0.125 2024-08-12 07:15:50,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1519060.0, ans=0.125 2024-08-12 07:16:01,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1519160.0, ans=0.0 2024-08-12 07:16:02,682 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 13 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 07:16:04,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.463e+01 2.859e+01 3.118e+01 2.003e+02, threshold=5.718e+01, percent-clipped=2.0 2024-08-12 07:16:14,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7000, loss[loss=0.1052, beats_loss=0.01013, ecapa_loss=0.0001725, whisper_loss=0.09339, over 16659.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01109, ecapa_loss=0.0001816, whisper_loss=0.09184, over 3890642.66 frames. ], batch size: 66, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:16:19,814 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.385e-01 2024-08-12 07:16:37,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-12 07:16:54,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-12 07:17:00,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1519560.0, ans=0.0 2024-08-12 07:17:04,662 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 07:17:13,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1519660.0, ans=0.125 2024-08-12 07:17:19,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1519660.0, ans=0.125 2024-08-12 07:17:25,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7050, loss[loss=0.09073, beats_loss=0.01293, ecapa_loss=0.0001549, whisper_loss=0.07625, over 17092.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01111, ecapa_loss=0.0001809, whisper_loss=0.09167, over 3909649.66 frames. ], batch size: 68, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:17:25,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1519760.0, ans=0.125 2024-08-12 07:17:36,698 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 07:17:59,189 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-152000.pt 2024-08-12 07:18:10,560 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 26 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-12 07:18:15,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1520060.0, ans=0.0 2024-08-12 07:18:28,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.499e+01 2.770e+01 3.110e+01 4.662e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 07:18:33,348 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 07:18:36,277 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 07:18:39,051 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 07:18:40,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7100, loss[loss=0.1148, beats_loss=0.01216, ecapa_loss=0.0001333, whisper_loss=0.1013, over 24501.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01118, ecapa_loss=0.0001794, whisper_loss=0.09108, over 3894857.83 frames. ], batch size: 94, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:18:46,222 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 07:18:47,481 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 07:18:50,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1520260.0, ans=0.0 2024-08-12 07:19:14,478 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 07:19:17,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1520460.0, ans=0.125 2024-08-12 07:19:23,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1520560.0, ans=0.07 2024-08-12 07:19:37,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2024-08-12 07:19:37,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-08-12 07:19:44,419 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 07:19:44,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1520660.0, ans=0.1 2024-08-12 07:19:46,501 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.608e-02 2024-08-12 07:19:54,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7150, loss[loss=0.07914, beats_loss=0.0135, ecapa_loss=0.0001659, whisper_loss=0.06398, over 23439.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01117, ecapa_loss=0.0001793, whisper_loss=0.09074, over 3885243.52 frames. ], batch size: 95, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:19:54,969 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 07:20:02,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-12 07:20:21,375 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 07:20:30,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1520960.0, ans=0.125 2024-08-12 07:20:32,422 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 07:20:42,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=12.0 2024-08-12 07:20:55,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.570e+01 2.914e+01 3.125e+01 1.770e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 07:21:07,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7200, loss[loss=0.1122, beats_loss=0.01059, ecapa_loss=0.0001781, whisper_loss=0.0998, over 23180.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01113, ecapa_loss=0.00018, whisper_loss=0.09105, over 3890548.35 frames. ], batch size: 90, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:21:39,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1521460.0, ans=0.2 2024-08-12 07:21:43,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1521460.0, ans=0.0 2024-08-12 07:21:55,709 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:22:02,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=12.0 2024-08-12 07:22:14,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2024-08-12 07:22:21,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1521760.0, ans=0.0 2024-08-12 07:22:22,201 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7250, loss[loss=0.1225, beats_loss=0.009883, ecapa_loss=0.0002051, whisper_loss=0.1106, over 19049.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01111, ecapa_loss=0.0001801, whisper_loss=0.091, over 3884407.18 frames. ], batch size: 76, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:22:24,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1521760.0, ans=0.125 2024-08-12 07:22:52,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1521960.0, ans=0.125 2024-08-12 07:22:54,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1521960.0, ans=0.1 2024-08-12 07:22:57,059 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 07:23:01,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2024-08-12 07:23:24,537 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.489e+01 2.804e+01 3.145e+01 4.718e+01, threshold=5.607e+01, percent-clipped=0.0 2024-08-12 07:23:26,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1522160.0, ans=0.1 2024-08-12 07:23:31,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1522160.0, ans=0.1 2024-08-12 07:23:35,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1522260.0, ans=15.0 2024-08-12 07:23:36,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7300, loss[loss=0.09302, beats_loss=0.01259, ecapa_loss=0.0002005, whisper_loss=0.07842, over 20763.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01109, ecapa_loss=0.0001797, whisper_loss=0.09174, over 3874737.44 frames. ], batch size: 86, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:23:47,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2024-08-12 07:23:48,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1522260.0, ans=0.2 2024-08-12 07:24:34,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1522660.0, ans=0.125 2024-08-12 07:24:37,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1522660.0, ans=0.2 2024-08-12 07:24:40,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2024-08-12 07:24:43,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1522660.0, ans=0.1 2024-08-12 07:24:46,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1522660.0, ans=0.0 2024-08-12 07:24:49,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7350, loss[loss=0.1215, beats_loss=0.01106, ecapa_loss=0.0001769, whisper_loss=0.1087, over 22263.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01105, ecapa_loss=0.0001797, whisper_loss=0.09193, over 3863339.60 frames. ], batch size: 89, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:25:27,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1522960.0, ans=0.0 2024-08-12 07:25:41,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1523060.0, ans=0.0 2024-08-12 07:25:48,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.14 vs. limit=22.5 2024-08-12 07:25:52,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.567e+01 3.033e+01 3.476e+01 4.624e+01, threshold=6.066e+01, percent-clipped=0.0 2024-08-12 07:25:52,647 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 07:25:58,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1523160.0, ans=0.0 2024-08-12 07:26:01,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1523160.0, ans=0.125 2024-08-12 07:26:03,987 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7400, loss[loss=0.09354, beats_loss=0.01098, ecapa_loss=0.0001829, whisper_loss=0.08073, over 16765.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01112, ecapa_loss=0.0001789, whisper_loss=0.09141, over 3863728.42 frames. ], batch size: 68, lr: 5.74e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:26:07,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-08-12 07:26:09,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-08-12 07:26:16,189 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 07:26:34,854 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-12 07:26:38,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-08-12 07:26:48,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1523560.0, ans=0.0 2024-08-12 07:26:50,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1523560.0, ans=0.125 2024-08-12 07:26:53,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1523560.0, ans=0.1 2024-08-12 07:27:05,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-12 07:27:14,236 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 07:27:18,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7450, loss[loss=0.1036, beats_loss=0.009455, ecapa_loss=0.0001647, whisper_loss=0.09249, over 14276.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01106, ecapa_loss=0.0001796, whisper_loss=0.09205, over 3841048.31 frames. ], batch size: 55, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:27:24,599 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 07:27:24,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1523760.0, ans=0.125 2024-08-12 07:27:26,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1523760.0, ans=0.0 2024-08-12 07:27:35,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1523860.0, ans=0.125 2024-08-12 07:27:43,743 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 07:27:45,143 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 07:28:03,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1524060.0, ans=0.04949747468305833 2024-08-12 07:28:10,590 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 07:28:20,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.662e+01 2.945e+01 3.324e+01 4.940e+01, threshold=5.890e+01, percent-clipped=0.0 2024-08-12 07:28:31,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7500, loss[loss=0.07218, beats_loss=0.01286, ecapa_loss=0.0001732, whisper_loss=0.05759, over 13913.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.0001801, whisper_loss=0.09218, over 3876071.89 frames. ], batch size: 59, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:28:39,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1524260.0, ans=0.125 2024-08-12 07:28:48,116 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 07:28:51,399 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:28:59,617 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 07:29:00,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=12.0 2024-08-12 07:29:01,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1524460.0, ans=0.125 2024-08-12 07:29:28,354 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 07:29:32,407 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 07:29:43,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7550, loss[loss=0.08826, beats_loss=0.01026, ecapa_loss=0.0001589, whisper_loss=0.07641, over 15687.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01101, ecapa_loss=0.0001813, whisper_loss=0.0921, over 3848433.96 frames. ], batch size: 62, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:30:01,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1524860.0, ans=0.0 2024-08-12 07:30:14,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2024-08-12 07:30:20,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1524960.0, ans=0.1 2024-08-12 07:30:39,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=1525060.0, ans=15.0 2024-08-12 07:30:40,707 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 07:30:42,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1525160.0, ans=0.125 2024-08-12 07:30:43,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1525160.0, ans=0.0 2024-08-12 07:30:45,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-08-12 07:30:46,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.510e+01 2.746e+01 3.098e+01 2.240e+02, threshold=5.492e+01, percent-clipped=2.0 2024-08-12 07:30:54,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1525160.0, ans=0.0 2024-08-12 07:30:57,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1525260.0, ans=0.0 2024-08-12 07:30:58,403 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7600, loss[loss=0.1189, beats_loss=0.009927, ecapa_loss=0.0001193, whisper_loss=0.1077, over 20801.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001815, whisper_loss=0.09237, over 3847048.48 frames. ], batch size: 74, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:31:02,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1525260.0, ans=0.0 2024-08-12 07:31:06,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1525260.0, ans=0.0 2024-08-12 07:31:07,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1525260.0, ans=0.0 2024-08-12 07:31:07,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1525260.0, ans=0.125 2024-08-12 07:31:12,028 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 37 from Vox, 22 fro AS 2024-08-12 07:31:18,870 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 07:31:27,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=1525460.0, ans=0.02 2024-08-12 07:31:40,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1525460.0, ans=0.2 2024-08-12 07:31:52,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1525560.0, ans=0.04949747468305833 2024-08-12 07:31:59,037 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 07:31:59,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1525660.0, ans=10.0 2024-08-12 07:32:00,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1525660.0, ans=0.125 2024-08-12 07:32:06,557 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-12 07:32:08,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1525660.0, ans=0.1 2024-08-12 07:32:12,177 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7650, loss[loss=0.1104, beats_loss=0.008007, ecapa_loss=0.0001643, whisper_loss=0.1007, over 15485.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01079, ecapa_loss=0.0001818, whisper_loss=0.09264, over 3864458.36 frames. ], batch size: 55, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:32:32,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1525860.0, ans=0.0 2024-08-12 07:32:40,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1525960.0, ans=0.0 2024-08-12 07:32:46,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-08-12 07:33:02,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-12 07:33:13,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.517e+01 2.819e+01 3.143e+01 1.705e+02, threshold=5.638e+01, percent-clipped=1.0 2024-08-12 07:33:13,180 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 07:33:20,712 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 07:33:25,139 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7700, loss[loss=0.1012, beats_loss=0.01142, ecapa_loss=0.000145, whisper_loss=0.08829, over 19838.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01077, ecapa_loss=0.0001819, whisper_loss=0.09288, over 3877603.41 frames. ], batch size: 77, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:33:25,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1526260.0, ans=0.0 2024-08-12 07:33:35,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1526260.0, ans=0.0 2024-08-12 07:33:42,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1526360.0, ans=0.125 2024-08-12 07:33:59,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1526460.0, ans=0.1 2024-08-12 07:34:13,187 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 07:34:33,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1526660.0, ans=0.09899494936611666 2024-08-12 07:34:42,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7750, loss[loss=0.08028, beats_loss=0.01566, ecapa_loss=0.0001467, whisper_loss=0.06315, over 13183.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01087, ecapa_loss=0.000181, whisper_loss=0.0923, over 3879382.35 frames. ], batch size: 54, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:34:44,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1526760.0, ans=0.0 2024-08-12 07:34:56,035 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.021e-01 2024-08-12 07:35:07,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1526860.0, ans=0.1 2024-08-12 07:35:08,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1526860.0, ans=0.2 2024-08-12 07:35:31,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1527060.0, ans=0.2 2024-08-12 07:35:44,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.468e+01 2.726e+01 3.182e+01 4.341e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 07:35:48,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-12 07:35:56,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7800, loss[loss=0.08952, beats_loss=0.009777, ecapa_loss=0.0001843, whisper_loss=0.0779, over 16321.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001798, whisper_loss=0.09173, over 3879917.60 frames. ], batch size: 65, lr: 5.73e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:36:24,673 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 07:36:33,668 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-12 07:36:39,741 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 07:36:43,647 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 07:37:04,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1527660.0, ans=0.0 2024-08-12 07:37:09,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7850, loss[loss=0.09528, beats_loss=0.0132, ecapa_loss=0.0001447, whisper_loss=0.08063, over 14154.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01104, ecapa_loss=0.0001778, whisper_loss=0.09178, over 3893844.19 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:37:15,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1527760.0, ans=0.125 2024-08-12 07:37:15,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1527760.0, ans=0.2 2024-08-12 07:37:24,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1527860.0, ans=0.0 2024-08-12 07:37:27,315 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 07:37:42,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1527960.0, ans=0.0 2024-08-12 07:37:47,629 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 07:37:52,516 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 07:37:52,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1527960.0, ans=0.0 2024-08-12 07:38:00,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1528060.0, ans=0.125 2024-08-12 07:38:02,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-12 07:38:07,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1528060.0, ans=0.0 2024-08-12 07:38:13,758 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.514e+01 2.915e+01 3.388e+01 6.482e+01, threshold=5.829e+01, percent-clipped=1.0 2024-08-12 07:38:25,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7900, loss[loss=0.1244, beats_loss=0.007769, ecapa_loss=0.0002094, whisper_loss=0.1145, over 15685.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01113, ecapa_loss=0.0001768, whisper_loss=0.09177, over 3888311.41 frames. ], batch size: 61, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:39:03,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2024-08-12 07:39:13,408 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 07:39:16,437 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 07:39:16,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1528560.0, ans=0.125 2024-08-12 07:39:37,939 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 7950, loss[loss=0.1232, beats_loss=0.00872, ecapa_loss=0.0001846, whisper_loss=0.1126, over 20942.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0112, ecapa_loss=0.0001763, whisper_loss=0.09126, over 3893587.89 frames. ], batch size: 78, lr: 5.73e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:39:49,904 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 07:39:50,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-12 07:39:56,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.37 vs. limit=22.5 2024-08-12 07:39:57,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1528860.0, ans=0.125 2024-08-12 07:40:04,487 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 07:40:15,540 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:40:40,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+01 2.548e+01 3.030e+01 3.373e+01 4.598e+01, threshold=6.060e+01, percent-clipped=0.0 2024-08-12 07:40:52,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8000, loss[loss=0.1188, beats_loss=0.009478, ecapa_loss=0.0001773, whisper_loss=0.1075, over 22723.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01119, ecapa_loss=0.0001752, whisper_loss=0.09131, over 3928854.30 frames. ], batch size: 88, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:41:00,623 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:41:17,133 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 07:41:46,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1529560.0, ans=0.0 2024-08-12 07:41:58,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1529660.0, ans=0.125 2024-08-12 07:42:07,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8050, loss[loss=0.09914, beats_loss=0.01165, ecapa_loss=0.0001682, whisper_loss=0.08581, over 21637.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01121, ecapa_loss=0.000177, whisper_loss=0.09132, over 3908319.24 frames. ], batch size: 87, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:42:14,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1529760.0, ans=0.1 2024-08-12 07:42:21,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1529860.0, ans=0.125 2024-08-12 07:43:08,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.431e+01 2.661e+01 3.080e+01 6.684e+01, threshold=5.323e+01, percent-clipped=1.0 2024-08-12 07:43:21,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8100, loss[loss=0.1062, beats_loss=0.01148, ecapa_loss=0.0001799, whisper_loss=0.0929, over 21365.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01122, ecapa_loss=0.0001762, whisper_loss=0.09148, over 3947683.62 frames. ], batch size: 84, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:43:37,328 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 07:43:47,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1530360.0, ans=0.07 2024-08-12 07:43:47,406 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.754e+05 2024-08-12 07:43:47,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-12 07:44:08,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1530560.0, ans=0.0 2024-08-12 07:44:22,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1530660.0, ans=0.125 2024-08-12 07:44:23,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1530660.0, ans=0.125 2024-08-12 07:44:23,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1530660.0, ans=0.0 2024-08-12 07:44:37,203 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8150, loss[loss=0.09459, beats_loss=0.01307, ecapa_loss=0.0001939, whisper_loss=0.07959, over 15423.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01121, ecapa_loss=0.0001768, whisper_loss=0.09104, over 3939785.63 frames. ], batch size: 63, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:44:58,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-12 07:45:05,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1530960.0, ans=0.125 2024-08-12 07:45:11,419 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 07:45:36,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1531160.0, ans=0.0 2024-08-12 07:45:38,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.559e+01 2.858e+01 3.192e+01 6.698e+01, threshold=5.715e+01, percent-clipped=1.0 2024-08-12 07:45:43,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1531160.0, ans=0.125 2024-08-12 07:45:50,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8200, loss[loss=0.09007, beats_loss=0.01103, ecapa_loss=0.0002218, whisper_loss=0.07682, over 13831.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001796, whisper_loss=0.09152, over 3939644.98 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 1.152921504606847e+18 2024-08-12 07:45:53,570 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-12 07:45:53,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1531260.0, ans=0.0 2024-08-12 07:46:09,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1531360.0, ans=0.1 2024-08-12 07:46:17,784 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.467e+05 2024-08-12 07:46:34,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1531560.0, ans=0.0 2024-08-12 07:46:59,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-08-12 07:46:59,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-08-12 07:47:01,860 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8250, loss[loss=0.1089, beats_loss=0.0112, ecapa_loss=0.0001713, whisper_loss=0.09594, over 22956.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01114, ecapa_loss=0.0001794, whisper_loss=0.09146, over 3905424.97 frames. ], batch size: 92, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:47:06,061 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 07:47:15,128 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 07:47:30,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:47:54,092 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 07:47:57,199 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-12 07:48:03,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.593e+01 2.850e+01 3.386e+01 5.334e+01, threshold=5.700e+01, percent-clipped=0.0 2024-08-12 07:48:14,108 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8300, loss[loss=0.08888, beats_loss=0.01317, ecapa_loss=0.0001706, whisper_loss=0.074, over 16175.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01114, ecapa_loss=0.0001787, whisper_loss=0.09183, over 3919027.22 frames. ], batch size: 66, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:48:22,594 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 07:48:46,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1532460.0, ans=0.05 2024-08-12 07:48:55,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1532560.0, ans=0.125 2024-08-12 07:48:56,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.35 vs. limit=5.0 2024-08-12 07:49:23,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8350, loss[loss=0.1063, beats_loss=0.01356, ecapa_loss=0.000115, whisper_loss=0.09158, over 22465.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01108, ecapa_loss=0.0001776, whisper_loss=0.09235, over 3922373.69 frames. ], batch size: 88, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:49:33,494 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:49:36,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1532860.0, ans=0.1 2024-08-12 07:49:46,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1532860.0, ans=0.0 2024-08-12 07:49:49,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1532860.0, ans=0.2 2024-08-12 07:49:56,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1532960.0, ans=0.1 2024-08-12 07:50:21,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1533160.0, ans=0.0 2024-08-12 07:50:23,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.502e+01 2.920e+01 3.300e+01 7.763e+01, threshold=5.841e+01, percent-clipped=2.0 2024-08-12 07:50:33,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8400, loss[loss=0.08539, beats_loss=0.01195, ecapa_loss=0.0001663, whisper_loss=0.07178, over 15255.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.000178, whisper_loss=0.09196, over 3908490.90 frames. ], batch size: 58, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:50:50,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1533360.0, ans=0.1 2024-08-12 07:50:51,225 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 07:50:52,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1533360.0, ans=0.125 2024-08-12 07:51:11,376 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 40 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 07:51:11,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1533460.0, ans=0.125 2024-08-12 07:51:17,939 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 07:51:24,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1533560.0, ans=0.1 2024-08-12 07:51:27,094 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-12 07:51:38,390 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 07:51:38,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1533660.0, ans=10.0 2024-08-12 07:51:39,857 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 07:51:45,276 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8450, loss[loss=0.08291, beats_loss=0.01378, ecapa_loss=0.0001752, whisper_loss=0.06738, over 21091.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.0001784, whisper_loss=0.09219, over 3903510.36 frames. ], batch size: 90, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:51:45,755 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 07:51:47,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-12 07:51:48,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-08-12 07:51:55,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1533760.0, ans=0.1 2024-08-12 07:52:01,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1533860.0, ans=0.025 2024-08-12 07:52:15,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2024-08-12 07:52:28,595 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 07:52:31,264 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 07:52:34,329 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 07:52:35,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1534060.0, ans=0.1 2024-08-12 07:52:37,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1534060.0, ans=0.2 2024-08-12 07:52:46,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.460e+01 2.717e+01 3.180e+01 4.918e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 07:52:51,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1534160.0, ans=0.1 2024-08-12 07:52:56,443 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8500, loss[loss=0.112, beats_loss=0.01292, ecapa_loss=0.0001568, whisper_loss=0.09756, over 22844.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01101, ecapa_loss=0.0001791, whisper_loss=0.0923, over 3899445.34 frames. ], batch size: 90, lr: 5.72e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:53:06,492 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 07:53:09,481 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-12 07:53:11,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-12 07:53:15,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1534360.0, ans=0.125 2024-08-12 07:53:18,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2024-08-12 07:53:22,400 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 07:53:22,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1534360.0, ans=0.025 2024-08-12 07:53:26,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1534460.0, ans=0.0 2024-08-12 07:53:36,136 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 07:53:42,142 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 07:53:56,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1534660.0, ans=0.2 2024-08-12 07:54:07,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8550, loss[loss=0.1262, beats_loss=0.009923, ecapa_loss=0.0001589, whisper_loss=0.1147, over 18201.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01104, ecapa_loss=0.0001791, whisper_loss=0.09234, over 3887189.43 frames. ], batch size: 67, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:54:29,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1534860.0, ans=0.125 2024-08-12 07:54:36,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1534960.0, ans=0.1 2024-08-12 07:54:58,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1535060.0, ans=0.125 2024-08-12 07:55:00,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1535060.0, ans=0.1 2024-08-12 07:55:09,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+01 2.569e+01 2.940e+01 3.392e+01 6.119e+01, threshold=5.880e+01, percent-clipped=2.0 2024-08-12 07:55:15,268 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 07:55:19,406 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8600, loss[loss=0.1014, beats_loss=0.01375, ecapa_loss=0.0001899, whisper_loss=0.08571, over 17420.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01101, ecapa_loss=0.0001782, whisper_loss=0.09285, over 3911989.99 frames. ], batch size: 71, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:55:23,686 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-12 07:55:40,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1535360.0, ans=0.125 2024-08-12 07:55:48,141 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-12 07:56:31,649 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8650, loss[loss=0.1219, beats_loss=0.00888, ecapa_loss=0.0001707, whisper_loss=0.1113, over 23899.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01104, ecapa_loss=0.0001789, whisper_loss=0.09253, over 3875328.97 frames. ], batch size: 92, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:56:45,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1535860.0, ans=0.07 2024-08-12 07:56:51,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1535860.0, ans=0.125 2024-08-12 07:57:00,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1535960.0, ans=0.1 2024-08-12 07:57:34,578 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.601e+01 2.885e+01 3.263e+01 5.509e+01, threshold=5.770e+01, percent-clipped=0.0 2024-08-12 07:57:35,150 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 07:57:35,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1536160.0, ans=0.125 2024-08-12 07:57:39,256 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-12 07:57:40,767 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 07:57:42,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1536160.0, ans=0.2 2024-08-12 07:57:45,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8700, loss[loss=0.09498, beats_loss=0.01223, ecapa_loss=0.0001593, whisper_loss=0.08116, over 19865.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01097, ecapa_loss=0.0001807, whisper_loss=0.09274, over 3878737.73 frames. ], batch size: 81, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:57:51,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2024-08-12 07:58:01,478 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-12 07:58:03,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1536360.0, ans=0.2 2024-08-12 07:58:08,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1536360.0, ans=0.2 2024-08-12 07:58:17,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1536460.0, ans=0.125 2024-08-12 07:58:30,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1536560.0, ans=0.1 2024-08-12 07:58:32,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1536560.0, ans=0.0 2024-08-12 07:58:39,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1536560.0, ans=0.1 2024-08-12 07:58:52,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-12 07:58:55,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1536660.0, ans=0.125 2024-08-12 07:58:55,350 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.102e-03 2024-08-12 07:58:57,670 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8750, loss[loss=0.1329, beats_loss=0.007851, ecapa_loss=0.0001767, whisper_loss=0.1233, over 18320.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01089, ecapa_loss=0.0001804, whisper_loss=0.09294, over 3863926.19 frames. ], batch size: 69, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 07:59:41,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1537060.0, ans=0.125 2024-08-12 07:59:42,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1537060.0, ans=0.2 2024-08-12 07:59:59,291 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.459e+01 2.752e+01 3.200e+01 4.704e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 08:00:09,634 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8800, loss[loss=0.09564, beats_loss=0.01085, ecapa_loss=0.0001819, whisper_loss=0.08297, over 22189.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01093, ecapa_loss=0.0001809, whisper_loss=0.09252, over 3845803.24 frames. ], batch size: 91, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:00:37,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1537460.0, ans=0.05 2024-08-12 08:00:44,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2024-08-12 08:00:48,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1537460.0, ans=0.125 2024-08-12 08:00:48,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1537460.0, ans=0.125 2024-08-12 08:00:50,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1537460.0, ans=0.125 2024-08-12 08:00:57,214 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 08:00:58,747 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-12 08:00:58,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1537560.0, ans=0.125 2024-08-12 08:01:08,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1537660.0, ans=0.0 2024-08-12 08:01:22,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8850, loss[loss=0.1114, beats_loss=0.01215, ecapa_loss=0.0001532, whisper_loss=0.09773, over 22253.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01107, ecapa_loss=0.0001794, whisper_loss=0.0921, over 3860398.11 frames. ], batch size: 88, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:01:22,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1537760.0, ans=0.125 2024-08-12 08:01:33,422 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 08:01:50,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1537960.0, ans=0.0 2024-08-12 08:01:51,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2024-08-12 08:02:26,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.513e+01 2.817e+01 3.159e+01 3.465e+02, threshold=5.633e+01, percent-clipped=4.0 2024-08-12 08:02:35,633 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 08:02:36,973 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8900, loss[loss=0.09108, beats_loss=0.01221, ecapa_loss=0.0001604, whisper_loss=0.07726, over 20445.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.00018, whisper_loss=0.09147, over 3834906.98 frames. ], batch size: 84, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:02:41,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1538260.0, ans=0.0 2024-08-12 08:02:43,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1538260.0, ans=0.125 2024-08-12 08:02:44,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1538260.0, ans=0.125 2024-08-12 08:02:54,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1538360.0, ans=0.2 2024-08-12 08:02:57,394 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 08:03:19,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1538560.0, ans=0.125 2024-08-12 08:03:50,472 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 8950, loss[loss=0.09947, beats_loss=0.0101, ecapa_loss=0.000162, whisper_loss=0.08775, over 15475.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01112, ecapa_loss=0.0001789, whisper_loss=0.09197, over 3860910.69 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:03:56,576 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 08:04:43,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1539060.0, ans=0.1 2024-08-12 08:04:44,235 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-12 08:04:44,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1539060.0, ans=0.04949747468305833 2024-08-12 08:04:47,130 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 08:04:52,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.556e+01 2.825e+01 3.281e+01 7.768e+01, threshold=5.651e+01, percent-clipped=1.0 2024-08-12 08:05:02,458 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9000, loss[loss=0.1029, beats_loss=0.0123, ecapa_loss=0.000202, whisper_loss=0.08857, over 22430.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01107, ecapa_loss=0.00018, whisper_loss=0.09213, over 3892493.72 frames. ], batch size: 91, lr: 5.71e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:05:02,459 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 08:05:41,683 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on ASR_libri: loss=0.2556, beats_loss=0, ecapa_loss=0.0006109, whisper_loss=0.2495, over 922467.00 frames. 2024-08-12 08:05:59,686 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on SV_voxceleb1: loss=0.004943, beats_loss=0, ecapa_loss=0.0004943, whisper_loss=0, over 939242.00 frames. 2024-08-12 08:07:52,932 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on AT_audioset: loss=0.02436, beats_loss=0.02436, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 08:07:52,937 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 08:07:55,878 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 08:07:57,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1539260.0, ans=0.125 2024-08-12 08:08:00,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1539260.0, ans=0.0 2024-08-12 08:08:01,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.54 vs. limit=10.0 2024-08-12 08:08:06,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1539360.0, ans=0.125 2024-08-12 08:08:07,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2024-08-12 08:08:11,740 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 18 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-12 08:08:38,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2024-08-12 08:08:40,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1539560.0, ans=0.125 2024-08-12 08:08:49,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1539560.0, ans=0.125 2024-08-12 08:08:58,478 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 08:09:01,774 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-12 08:09:05,573 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9050, loss[loss=0.103, beats_loss=0.01473, ecapa_loss=0.0001188, whisper_loss=0.08708, over 19298.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01101, ecapa_loss=0.0001795, whisper_loss=0.09235, over 3876718.76 frames. ], batch size: 75, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:09:05,833 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 39 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 08:09:13,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1539760.0, ans=0.0 2024-08-12 08:09:19,437 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 08:09:32,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1539860.0, ans=0.0 2024-08-12 08:09:50,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1540060.0, ans=0.125 2024-08-12 08:09:50,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1540060.0, ans=0.0 2024-08-12 08:09:55,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1540060.0, ans=0.0 2024-08-12 08:10:01,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-12 08:10:05,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1540160.0, ans=0.0 2024-08-12 08:10:06,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1540160.0, ans=0.2 2024-08-12 08:10:09,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.551e+01 2.907e+01 3.420e+01 5.824e+01, threshold=5.813e+01, percent-clipped=1.0 2024-08-12 08:10:11,161 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-12 08:10:18,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1540260.0, ans=0.125 2024-08-12 08:10:19,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9100, loss[loss=0.1005, beats_loss=0.01085, ecapa_loss=0.0002034, whisper_loss=0.0876, over 21203.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01094, ecapa_loss=0.0001814, whisper_loss=0.09326, over 3910995.58 frames. ], batch size: 92, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:10:33,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-12 08:10:36,201 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 08:10:39,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1540360.0, ans=0.95 2024-08-12 08:10:49,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-12 08:10:51,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-08-12 08:10:55,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1540460.0, ans=0.1 2024-08-12 08:11:08,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1540560.0, ans=0.125 2024-08-12 08:11:12,418 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 08:11:12,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1540560.0, ans=0.125 2024-08-12 08:11:18,665 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 08:11:33,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9150, loss[loss=0.1071, beats_loss=0.01233, ecapa_loss=0.000183, whisper_loss=0.09298, over 21494.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01102, ecapa_loss=0.0001806, whisper_loss=0.09301, over 3941313.39 frames. ], batch size: 89, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:11:50,141 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 08:11:50,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2024-08-12 08:11:53,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=22.5 2024-08-12 08:12:00,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.08 vs. limit=22.5 2024-08-12 08:12:17,408 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-12 08:12:35,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.112e+01 2.613e+01 2.813e+01 3.154e+01 4.389e+01, threshold=5.626e+01, percent-clipped=0.0 2024-08-12 08:12:39,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1541160.0, ans=0.1 2024-08-12 08:12:40,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1541160.0, ans=0.0 2024-08-12 08:12:46,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9200, loss[loss=0.1245, beats_loss=0.01116, ecapa_loss=0.0001553, whisper_loss=0.1118, over 23216.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01111, ecapa_loss=0.0001798, whisper_loss=0.09246, over 3928937.48 frames. ], batch size: 89, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:12:48,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-12 08:13:05,163 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 08:13:17,463 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 08:13:31,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1541560.0, ans=0.2 2024-08-12 08:13:34,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.49 vs. limit=15.0 2024-08-12 08:13:44,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1541660.0, ans=0.125 2024-08-12 08:13:57,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9250, loss[loss=0.09432, beats_loss=0.01152, ecapa_loss=0.0001946, whisper_loss=0.08085, over 22792.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01109, ecapa_loss=0.0001818, whisper_loss=0.09269, over 3959147.59 frames. ], batch size: 94, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:14:09,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1541760.0, ans=0.125 2024-08-12 08:14:14,563 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 08:14:18,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-08-12 08:14:32,954 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 08:14:34,222 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 08:14:47,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1542060.0, ans=0.0 2024-08-12 08:15:00,716 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.527e+01 2.843e+01 3.278e+01 5.057e+01, threshold=5.687e+01, percent-clipped=0.0 2024-08-12 08:15:05,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1542160.0, ans=0.125 2024-08-12 08:15:10,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1542260.0, ans=0.0 2024-08-12 08:15:10,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1542260.0, ans=0.0 2024-08-12 08:15:10,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1542260.0, ans=0.125 2024-08-12 08:15:11,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9300, loss[loss=0.1097, beats_loss=0.01135, ecapa_loss=0.0001773, whisper_loss=0.09656, over 21468.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01106, ecapa_loss=0.0001826, whisper_loss=0.09248, over 3925599.16 frames. ], batch size: 84, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:15:21,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1542260.0, ans=0.1 2024-08-12 08:15:22,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1542260.0, ans=0.0 2024-08-12 08:15:33,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1542360.0, ans=0.125 2024-08-12 08:15:35,948 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 08:15:43,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=12.0 2024-08-12 08:16:09,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1542660.0, ans=0.125 2024-08-12 08:16:23,413 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 19 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 08:16:26,542 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9350, loss[loss=0.1211, beats_loss=0.01075, ecapa_loss=0.0001429, whisper_loss=0.1089, over 23685.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01113, ecapa_loss=0.0001816, whisper_loss=0.09195, over 3902487.62 frames. ], batch size: 89, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:16:54,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1542860.0, ans=0.125 2024-08-12 08:16:55,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1542960.0, ans=0.025 2024-08-12 08:17:10,635 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 08:17:31,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1543160.0, ans=0.2 2024-08-12 08:17:31,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.553e+01 2.819e+01 3.364e+01 6.243e+01, threshold=5.639e+01, percent-clipped=2.0 2024-08-12 08:17:33,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1543160.0, ans=0.0 2024-08-12 08:17:39,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1543160.0, ans=0.125 2024-08-12 08:17:43,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9400, loss[loss=0.1196, beats_loss=0.01062, ecapa_loss=0.0001918, whisper_loss=0.1071, over 23795.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01119, ecapa_loss=0.00018, whisper_loss=0.09132, over 3888908.45 frames. ], batch size: 95, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:17:57,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1543360.0, ans=0.125 2024-08-12 08:18:02,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1543360.0, ans=0.125 2024-08-12 08:18:15,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1543460.0, ans=0.125 2024-08-12 08:18:37,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1543560.0, ans=0.125 2024-08-12 08:18:48,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1543660.0, ans=0.125 2024-08-12 08:18:55,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1543660.0, ans=0.0 2024-08-12 08:18:58,834 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9450, loss[loss=0.1066, beats_loss=0.0129, ecapa_loss=0.0001409, whisper_loss=0.09229, over 18179.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01116, ecapa_loss=0.0001803, whisper_loss=0.09129, over 3891973.72 frames. ], batch size: 70, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:19:04,755 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 08:19:11,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1543760.0, ans=15.0 2024-08-12 08:19:19,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1543860.0, ans=0.0 2024-08-12 08:19:32,887 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-12 08:19:34,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1543960.0, ans=0.125 2024-08-12 08:19:38,239 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-12 08:19:52,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1544060.0, ans=0.125 2024-08-12 08:20:01,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.483e+01 2.821e+01 3.317e+01 4.965e+01, threshold=5.642e+01, percent-clipped=0.0 2024-08-12 08:20:06,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1544160.0, ans=0.0 2024-08-12 08:20:08,787 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 14 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 08:20:11,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9500, loss[loss=0.1139, beats_loss=0.0098, ecapa_loss=0.0002095, whisper_loss=0.102, over 20937.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01116, ecapa_loss=0.0001814, whisper_loss=0.09128, over 3890697.32 frames. ], batch size: 89, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:20:21,365 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 08:20:31,691 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-12 08:20:34,380 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 08:20:44,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1544460.0, ans=0.1 2024-08-12 08:20:48,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-12 08:21:05,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1544560.0, ans=0.125 2024-08-12 08:21:10,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1544660.0, ans=0.125 2024-08-12 08:21:22,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.16 vs. limit=10.0 2024-08-12 08:21:24,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9550, loss[loss=0.09203, beats_loss=0.009998, ecapa_loss=0.0001809, whisper_loss=0.08022, over 17965.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001806, whisper_loss=0.09189, over 3875612.89 frames. ], batch size: 71, lr: 5.70e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:21:39,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1544860.0, ans=0.125 2024-08-12 08:21:53,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-12 08:21:58,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1544960.0, ans=0.0 2024-08-12 08:22:00,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1544960.0, ans=0.125 2024-08-12 08:22:08,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1545060.0, ans=0.2 2024-08-12 08:22:24,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1545160.0, ans=0.125 2024-08-12 08:22:26,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.581e+01 2.910e+01 3.415e+01 4.856e+01, threshold=5.819e+01, percent-clipped=0.0 2024-08-12 08:22:27,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1545160.0, ans=0.2 2024-08-12 08:22:36,584 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9600, loss[loss=0.1142, beats_loss=0.01164, ecapa_loss=0.0001794, whisper_loss=0.1008, over 22633.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01097, ecapa_loss=0.0001803, whisper_loss=0.09205, over 3871190.53 frames. ], batch size: 89, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:22:52,771 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-12 08:23:00,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1545360.0, ans=0.125 2024-08-12 08:23:11,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1545460.0, ans=0.125 2024-08-12 08:23:26,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1545560.0, ans=0.0 2024-08-12 08:23:29,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1545560.0, ans=0.2 2024-08-12 08:23:49,498 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9650, loss[loss=0.1301, beats_loss=0.009854, ecapa_loss=0.0001545, whisper_loss=0.1187, over 15673.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001807, whisper_loss=0.09232, over 3850863.48 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:23:52,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1545760.0, ans=0.09899494936611666 2024-08-12 08:24:06,762 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 08:24:10,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1545860.0, ans=15.0 2024-08-12 08:24:11,012 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 08:24:14,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1545860.0, ans=0.0 2024-08-12 08:24:23,499 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 08:24:27,762 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 08:24:36,606 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 08:24:50,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.490e+01 2.776e+01 3.280e+01 4.565e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 08:24:50,646 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 08:24:54,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1546160.0, ans=0.2 2024-08-12 08:25:00,801 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9700, loss[loss=0.09906, beats_loss=0.007542, ecapa_loss=0.0002059, whisper_loss=0.08946, over 15833.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.000182, whisper_loss=0.0922, over 3839893.52 frames. ], batch size: 62, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:25:08,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1546260.0, ans=0.2 2024-08-12 08:25:24,727 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 15 from LS+wenet, 25 from Vox, 52 fro AS 2024-08-12 08:25:40,762 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 08:25:45,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1546560.0, ans=0.125 2024-08-12 08:25:54,240 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 08:26:13,097 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9750, loss[loss=0.1015, beats_loss=0.0084, ecapa_loss=0.0002187, whisper_loss=0.09087, over 17066.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0111, ecapa_loss=0.0001816, whisper_loss=0.09137, over 3861323.10 frames. ], batch size: 68, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:26:23,390 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 08:26:36,121 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-12 08:26:44,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1546960.0, ans=0.125 2024-08-12 08:27:05,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1547060.0, ans=0.125 2024-08-12 08:27:05,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1547060.0, ans=0.0 2024-08-12 08:27:08,905 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:27:10,064 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 08:27:16,899 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.437e+01 2.801e+01 3.445e+01 6.244e+01, threshold=5.602e+01, percent-clipped=1.0 2024-08-12 08:27:26,101 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 08:27:26,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1547260.0, ans=0.125 2024-08-12 08:27:27,233 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9800, loss[loss=0.1061, beats_loss=0.01072, ecapa_loss=0.0001678, whisper_loss=0.09369, over 17191.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001813, whisper_loss=0.09151, over 3845536.33 frames. ], batch size: 69, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:27:40,728 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-12 08:27:42,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1547360.0, ans=0.125 2024-08-12 08:27:48,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-12 08:27:55,996 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-12 08:27:57,567 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:27:57,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1547460.0, ans=0.5 2024-08-12 08:28:27,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1547660.0, ans=0.05 2024-08-12 08:28:42,415 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9850, loss[loss=0.1255, beats_loss=0.0112, ecapa_loss=0.0001695, whisper_loss=0.1126, over 22450.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001808, whisper_loss=0.09245, over 3851919.10 frames. ], batch size: 92, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:28:48,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1547760.0, ans=0.125 2024-08-12 08:28:59,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1547860.0, ans=0.125 2024-08-12 08:29:02,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-08-12 08:29:05,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1547860.0, ans=0.2 2024-08-12 08:29:11,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1547960.0, ans=0.1 2024-08-12 08:29:16,079 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-12 08:29:47,658 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.546e+01 2.857e+01 3.272e+01 5.247e+01, threshold=5.713e+01, percent-clipped=0.0 2024-08-12 08:29:50,697 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 08:29:55,278 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 08:29:57,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9900, loss[loss=0.1017, beats_loss=0.01376, ecapa_loss=0.0001716, whisper_loss=0.08623, over 18692.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01114, ecapa_loss=0.0001807, whisper_loss=0.09184, over 3850577.25 frames. ], batch size: 76, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:30:23,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1548360.0, ans=0.125 2024-08-12 08:30:27,451 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 08:30:34,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1548460.0, ans=0.2 2024-08-12 08:30:43,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1548560.0, ans=0.125 2024-08-12 08:30:43,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1548560.0, ans=0.0 2024-08-12 08:30:55,287 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 08:31:10,878 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 9950, loss[loss=0.1013, beats_loss=0.01002, ecapa_loss=0.0001611, whisper_loss=0.08965, over 14915.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01111, ecapa_loss=0.0001802, whisper_loss=0.09206, over 3849922.41 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:31:11,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1548760.0, ans=0.1 2024-08-12 08:31:11,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2024-08-12 08:31:15,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2024-08-12 08:31:18,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2024-08-12 08:31:24,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1548860.0, ans=0.0 2024-08-12 08:31:27,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1548860.0, ans=0.125 2024-08-12 08:31:29,431 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-12 08:31:31,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1548860.0, ans=6.0 2024-08-12 08:31:40,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1548960.0, ans=0.125 2024-08-12 08:31:50,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1548960.0, ans=0.125 2024-08-12 08:31:59,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1549060.0, ans=0.125 2024-08-12 08:32:03,780 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:32:13,525 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 08:32:14,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.498e+01 2.780e+01 3.249e+01 5.152e+01, threshold=5.559e+01, percent-clipped=0.0 2024-08-12 08:32:24,577 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10000, loss[loss=0.1099, beats_loss=0.009556, ecapa_loss=0.0001508, whisper_loss=0.09881, over 17621.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01104, ecapa_loss=0.0001803, whisper_loss=0.09214, over 3844872.35 frames. ], batch size: 65, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:32:25,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1549260.0, ans=0.125 2024-08-12 08:32:32,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1549260.0, ans=0.125 2024-08-12 08:33:02,734 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 08:33:16,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-12 08:33:17,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1549560.0, ans=0.0 2024-08-12 08:33:18,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-08-12 08:33:38,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10050, loss[loss=0.1234, beats_loss=0.01129, ecapa_loss=0.0001612, whisper_loss=0.1105, over 19898.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01101, ecapa_loss=0.0001807, whisper_loss=0.09242, over 3851711.90 frames. ], batch size: 76, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:34:10,788 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 08:34:12,423 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 08:34:15,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-12 08:34:24,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1550060.0, ans=0.1 2024-08-12 08:34:27,863 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-12 08:34:28,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1550060.0, ans=0.0 2024-08-12 08:34:40,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.497e+01 2.870e+01 3.338e+01 7.482e+01, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 08:34:44,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1550160.0, ans=0.0 2024-08-12 08:34:51,431 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10100, loss[loss=0.09555, beats_loss=0.01208, ecapa_loss=0.0001687, whisper_loss=0.08178, over 19097.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01112, ecapa_loss=0.0001806, whisper_loss=0.09232, over 3861127.15 frames. ], batch size: 76, lr: 5.69e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:35:03,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-12 08:35:33,137 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 08:35:50,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1550660.0, ans=0.0 2024-08-12 08:35:53,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1550660.0, ans=0.0 2024-08-12 08:35:54,384 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 08:35:55,944 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 08:36:04,974 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10150, loss[loss=0.11, beats_loss=0.01086, ecapa_loss=0.0001727, whisper_loss=0.09738, over 22451.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.0001814, whisper_loss=0.09173, over 3862259.77 frames. ], batch size: 90, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:36:14,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1550760.0, ans=0.125 2024-08-12 08:36:23,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-12 08:36:29,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-12 08:36:36,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1550960.0, ans=10.0 2024-08-12 08:36:42,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1550960.0, ans=0.1 2024-08-12 08:36:51,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1551060.0, ans=0.0 2024-08-12 08:36:53,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1551060.0, ans=0.125 2024-08-12 08:37:10,142 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.557e+01 2.799e+01 3.287e+01 1.688e+02, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 08:37:19,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1551160.0, ans=0.125 2024-08-12 08:37:19,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1551160.0, ans=0.125 2024-08-12 08:37:20,561 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 08:37:21,582 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10200, loss[loss=0.077, beats_loss=0.01436, ecapa_loss=0.000145, whisper_loss=0.06119, over 16478.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01102, ecapa_loss=0.0001805, whisper_loss=0.09183, over 3853799.69 frames. ], batch size: 64, lr: 5.68e-03, grad_scale: 5.764607523034235e+17 2024-08-12 08:37:24,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1551260.0, ans=0.125 2024-08-12 08:37:36,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1551360.0, ans=0.015 2024-08-12 08:37:42,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1551360.0, ans=0.125 2024-08-12 08:37:55,823 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 33 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 08:37:57,629 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 08:38:01,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1551460.0, ans=10.0 2024-08-12 08:38:01,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1551460.0, ans=0.2 2024-08-12 08:38:29,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1551660.0, ans=0.2 2024-08-12 08:38:34,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1551660.0, ans=0.125 2024-08-12 08:38:34,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2024-08-12 08:38:38,571 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10250, loss[loss=0.0782, beats_loss=0.01227, ecapa_loss=0.0002331, whisper_loss=0.0636, over 15416.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.011, ecapa_loss=0.0001818, whisper_loss=0.09194, over 3842885.33 frames. ], batch size: 69, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:39:13,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1551960.0, ans=0.125 2024-08-12 08:39:15,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1551960.0, ans=0.0 2024-08-12 08:39:24,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1552060.0, ans=0.0 2024-08-12 08:39:25,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2024-08-12 08:39:30,754 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 08:39:45,307 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 08:39:46,431 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.422e+01 2.707e+01 3.104e+01 5.382e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 08:39:54,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1552160.0, ans=0.05 2024-08-12 08:39:57,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10300, loss[loss=0.1152, beats_loss=0.009803, ecapa_loss=0.0001918, whisper_loss=0.1035, over 18153.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001812, whisper_loss=0.09224, over 3881689.59 frames. ], batch size: 74, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:40:03,494 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 08:40:13,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1552360.0, ans=0.125 2024-08-12 08:40:27,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1552460.0, ans=0.1 2024-08-12 08:40:43,035 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 08:40:49,612 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 08:40:49,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1552560.0, ans=0.1 2024-08-12 08:40:51,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1552560.0, ans=0.125 2024-08-12 08:41:05,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1552660.0, ans=0.125 2024-08-12 08:41:08,137 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:41:13,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2024-08-12 08:41:13,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10350, loss[loss=0.08368, beats_loss=0.01091, ecapa_loss=0.0001698, whisper_loss=0.07107, over 17613.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001799, whisper_loss=0.09218, over 3896334.03 frames. ], batch size: 69, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:41:18,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1552760.0, ans=0.125 2024-08-12 08:41:36,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1552860.0, ans=0.0 2024-08-12 08:41:51,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1552960.0, ans=0.05 2024-08-12 08:41:55,899 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 08:41:59,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2024-08-12 08:42:04,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1553060.0, ans=0.125 2024-08-12 08:42:10,209 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 08:42:17,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.587e+01 2.793e+01 3.199e+01 6.798e+01, threshold=5.587e+01, percent-clipped=1.0 2024-08-12 08:42:27,534 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10400, loss[loss=0.1016, beats_loss=0.009969, ecapa_loss=0.0002002, whisper_loss=0.08963, over 16981.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001811, whisper_loss=0.09159, over 3904252.49 frames. ], batch size: 67, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:42:31,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1553260.0, ans=0.125 2024-08-12 08:42:49,517 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 08:42:49,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1553360.0, ans=0.125 2024-08-12 08:42:54,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1553360.0, ans=0.0 2024-08-12 08:42:57,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1553460.0, ans=0.0 2024-08-12 08:43:01,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1553460.0, ans=0.0 2024-08-12 08:43:17,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1553560.0, ans=0.125 2024-08-12 08:43:19,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1553560.0, ans=0.125 2024-08-12 08:43:23,195 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 08:43:27,869 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 08:43:32,178 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 08:43:32,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-08-12 08:43:34,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1553660.0, ans=0.125 2024-08-12 08:43:43,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10450, loss[loss=0.114, beats_loss=0.01134, ecapa_loss=0.0001631, whisper_loss=0.101, over 23593.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01104, ecapa_loss=0.0001801, whisper_loss=0.09121, over 3897693.31 frames. ], batch size: 92, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:44:03,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1553860.0, ans=0.1 2024-08-12 08:44:08,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1553860.0, ans=0.125 2024-08-12 08:44:13,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1553960.0, ans=0.2 2024-08-12 08:44:22,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=12.0 2024-08-12 08:44:33,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2024-08-12 08:44:46,330 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 08:44:48,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.496e+01 2.841e+01 3.416e+01 4.859e+01, threshold=5.681e+01, percent-clipped=0.0 2024-08-12 08:44:51,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-08-12 08:44:58,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1554260.0, ans=0.125 2024-08-12 08:44:59,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10500, loss[loss=0.1258, beats_loss=0.01073, ecapa_loss=0.0001744, whisper_loss=0.1133, over 21706.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.000181, whisper_loss=0.09145, over 3901346.04 frames. ], batch size: 85, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:45:10,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2024-08-12 08:45:26,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1554360.0, ans=0.1 2024-08-12 08:45:27,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2024-08-12 08:45:54,560 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 08:46:12,907 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10550, loss[loss=0.1128, beats_loss=0.01043, ecapa_loss=0.0001515, whisper_loss=0.1009, over 23302.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01101, ecapa_loss=0.0001814, whisper_loss=0.09123, over 3884907.20 frames. ], batch size: 90, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:46:39,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1554860.0, ans=0.0 2024-08-12 08:46:42,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1554960.0, ans=0.0 2024-08-12 08:46:59,050 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 08:47:04,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=12.0 2024-08-12 08:47:15,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1555160.0, ans=0.0 2024-08-12 08:47:18,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.542e+01 2.754e+01 3.046e+01 4.371e+01, threshold=5.507e+01, percent-clipped=0.0 2024-08-12 08:47:29,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10600, loss[loss=0.0972, beats_loss=0.01094, ecapa_loss=0.0001723, whisper_loss=0.08453, over 22529.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001807, whisper_loss=0.09131, over 3885371.57 frames. ], batch size: 91, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:47:52,438 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-12 08:47:53,770 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 08:48:02,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1555460.0, ans=0.1 2024-08-12 08:48:10,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1555460.0, ans=0.0 2024-08-12 08:48:15,953 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 08:48:31,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1555660.0, ans=0.125 2024-08-12 08:48:43,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10650, loss[loss=0.1033, beats_loss=0.00933, ecapa_loss=0.0001749, whisper_loss=0.09219, over 18305.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01108, ecapa_loss=0.0001792, whisper_loss=0.09148, over 3902993.24 frames. ], batch size: 69, lr: 5.68e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:49:09,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1555860.0, ans=0.2 2024-08-12 08:49:29,127 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 08:49:29,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1556060.0, ans=0.0 2024-08-12 08:49:33,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1556060.0, ans=0.125 2024-08-12 08:49:39,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.14 vs. limit=15.0 2024-08-12 08:49:40,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1556060.0, ans=0.1 2024-08-12 08:49:47,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.652e+01 2.957e+01 3.394e+01 5.576e+01, threshold=5.914e+01, percent-clipped=1.0 2024-08-12 08:49:48,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1556160.0, ans=0.125 2024-08-12 08:49:58,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10700, loss[loss=0.1176, beats_loss=0.01224, ecapa_loss=0.0001238, whisper_loss=0.1041, over 22859.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0111, ecapa_loss=0.0001776, whisper_loss=0.09191, over 3903247.15 frames. ], batch size: 85, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:50:03,400 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-12 08:50:44,374 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 08:51:12,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10750, loss[loss=0.09727, beats_loss=0.01235, ecapa_loss=0.0001694, whisper_loss=0.08323, over 22663.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01107, ecapa_loss=0.0001773, whisper_loss=0.09268, over 3896451.21 frames. ], batch size: 91, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:51:21,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-12 08:51:36,680 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 08:51:40,391 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 34 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 08:51:43,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1556960.0, ans=0.025 2024-08-12 08:51:57,718 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 08:52:03,652 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 08:52:03,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1557060.0, ans=0.0 2024-08-12 08:52:11,066 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 08:52:17,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.513e+01 2.826e+01 3.158e+01 5.993e+01, threshold=5.652e+01, percent-clipped=1.0 2024-08-12 08:52:28,029 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10800, loss[loss=0.09296, beats_loss=0.01232, ecapa_loss=0.000168, whisper_loss=0.07895, over 21507.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01106, ecapa_loss=0.0001771, whisper_loss=0.09315, over 3890093.36 frames. ], batch size: 87, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:52:31,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1557260.0, ans=0.125 2024-08-12 08:52:58,513 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 08:53:02,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=12.0 2024-08-12 08:53:03,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1557460.0, ans=0.07 2024-08-12 08:53:08,781 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 29 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 08:53:11,715 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-12 08:53:12,955 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 08:53:15,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1557560.0, ans=0.1 2024-08-12 08:53:18,277 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 08:53:22,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1557560.0, ans=0.125 2024-08-12 08:53:41,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1557760.0, ans=0.125 2024-08-12 08:53:42,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10850, loss[loss=0.113, beats_loss=0.01152, ecapa_loss=0.0001264, whisper_loss=0.1002, over 23340.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01112, ecapa_loss=0.0001776, whisper_loss=0.09257, over 3904301.52 frames. ], batch size: 87, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:53:52,596 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 08:54:19,855 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 08:54:33,288 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 08:54:33,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1558060.0, ans=0.125 2024-08-12 08:54:36,403 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 08:54:46,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1558160.0, ans=0.0 2024-08-12 08:54:47,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.594e+01 2.957e+01 3.345e+01 7.139e+01, threshold=5.915e+01, percent-clipped=2.0 2024-08-12 08:54:53,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1558160.0, ans=0.125 2024-08-12 08:54:53,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-12 08:54:57,486 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 08:54:59,333 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10900, loss[loss=0.1141, beats_loss=0.009267, ecapa_loss=0.0001774, whisper_loss=0.1031, over 22243.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01116, ecapa_loss=0.0001767, whisper_loss=0.09231, over 3898315.53 frames. ], batch size: 87, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:55:16,812 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.295e-01 2024-08-12 08:55:24,995 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 08:55:26,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1558360.0, ans=0.125 2024-08-12 08:56:08,968 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-12 08:56:15,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-12 08:56:18,974 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 10950, loss[loss=0.1056, beats_loss=0.01167, ecapa_loss=0.0001764, whisper_loss=0.09217, over 22155.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01119, ecapa_loss=0.0001772, whisper_loss=0.09196, over 3900605.35 frames. ], batch size: 90, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:56:40,311 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 08:56:58,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1558960.0, ans=0.0 2024-08-12 08:57:06,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1558960.0, ans=0.1 2024-08-12 08:57:13,962 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 08:57:29,098 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 08:57:35,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1559160.0, ans=0.0 2024-08-12 08:57:36,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.556e+01 2.763e+01 3.156e+01 4.815e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 08:57:50,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11000, loss[loss=0.1106, beats_loss=0.01167, ecapa_loss=0.0001379, whisper_loss=0.09755, over 19082.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01109, ecapa_loss=0.0001792, whisper_loss=0.09258, over 3902578.93 frames. ], batch size: 74, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:57:54,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-12 08:58:08,252 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 08:58:08,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1559360.0, ans=0.125 2024-08-12 08:58:17,882 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 08:58:24,717 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 08:58:27,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-12 08:58:36,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1559460.0, ans=0.0 2024-08-12 08:58:37,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1559460.0, ans=0.0 2024-08-12 08:58:41,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1559560.0, ans=0.1 2024-08-12 08:59:05,746 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 14 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 08:59:06,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1559660.0, ans=0.1 2024-08-12 08:59:09,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1559660.0, ans=0.04949747468305833 2024-08-12 08:59:13,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11050, loss[loss=0.07263, beats_loss=0.01569, ecapa_loss=0.0001379, whisper_loss=0.05556, over 20814.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0111, ecapa_loss=0.0001791, whisper_loss=0.09227, over 3906735.38 frames. ], batch size: 87, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 08:59:20,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1559760.0, ans=0.1 2024-08-12 08:59:33,630 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 08:59:54,770 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 09:00:05,081 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-156000.pt 2024-08-12 09:00:26,171 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 09:00:27,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1560060.0, ans=0.125 2024-08-12 09:00:37,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1560060.0, ans=0.0 2024-08-12 09:00:49,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.392e+01 2.745e+01 3.211e+01 4.714e+01, threshold=5.490e+01, percent-clipped=0.0 2024-08-12 09:00:53,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1560160.0, ans=0.035 2024-08-12 09:01:02,304 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 09:01:04,731 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11100, loss[loss=0.1073, beats_loss=0.01216, ecapa_loss=0.0002108, whisper_loss=0.09299, over 17703.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01109, ecapa_loss=0.0001792, whisper_loss=0.09226, over 3899627.42 frames. ], batch size: 72, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:01:08,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1560260.0, ans=0.125 2024-08-12 09:01:12,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1560260.0, ans=0.0 2024-08-12 09:01:19,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1560260.0, ans=0.125 2024-08-12 09:01:25,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1560360.0, ans=0.0 2024-08-12 09:01:47,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1560360.0, ans=0.0 2024-08-12 09:01:51,135 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 09:02:03,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1560460.0, ans=0.0 2024-08-12 09:02:41,449 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 09:02:45,405 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-12 09:02:50,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1560660.0, ans=0.125 2024-08-12 09:02:52,202 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 27 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-12 09:02:59,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11150, loss[loss=0.1189, beats_loss=0.008147, ecapa_loss=0.0001983, whisper_loss=0.1088, over 22496.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01098, ecapa_loss=0.0001781, whisper_loss=0.09266, over 3870280.52 frames. ], batch size: 88, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:03:13,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1560760.0, ans=0.125 2024-08-12 09:03:37,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1560860.0, ans=0.125 2024-08-12 09:03:46,579 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 09:03:59,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1560960.0, ans=0.125 2024-08-12 09:04:15,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1561060.0, ans=0.1 2024-08-12 09:04:18,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1561060.0, ans=0.125 2024-08-12 09:04:27,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-08-12 09:04:29,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.591e+01 2.914e+01 3.431e+01 1.120e+02, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 09:04:35,955 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 09:04:37,549 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 09:04:40,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11200, loss[loss=0.0918, beats_loss=0.0118, ecapa_loss=0.0001643, whisper_loss=0.07836, over 23049.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01101, ecapa_loss=0.0001789, whisper_loss=0.09242, over 3872214.32 frames. ], batch size: 91, lr: 5.67e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:05:03,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1561360.0, ans=0.0 2024-08-12 09:05:03,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-12 09:05:09,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1561460.0, ans=0.125 2024-08-12 09:05:23,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-12 09:05:25,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1561560.0, ans=0.2 2024-08-12 09:05:35,900 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 09:05:55,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11250, loss[loss=0.07977, beats_loss=0.01255, ecapa_loss=0.0001574, whisper_loss=0.06565, over 17121.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01097, ecapa_loss=0.0001785, whisper_loss=0.0926, over 3857996.57 frames. ], batch size: 73, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:05:57,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1561760.0, ans=0.2 2024-08-12 09:06:02,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1561760.0, ans=0.1 2024-08-12 09:06:08,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1561760.0, ans=0.125 2024-08-12 09:06:19,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-08-12 09:06:26,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1561960.0, ans=0.125 2024-08-12 09:07:00,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.463e+01 2.812e+01 3.090e+01 4.861e+01, threshold=5.624e+01, percent-clipped=0.0 2024-08-12 09:07:01,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1562160.0, ans=0.125 2024-08-12 09:07:09,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1562160.0, ans=0.1 2024-08-12 09:07:12,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11300, loss[loss=0.1075, beats_loss=0.005706, ecapa_loss=0.0002283, whisper_loss=0.0995, over 15431.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01102, ecapa_loss=0.000178, whisper_loss=0.09199, over 3888373.99 frames. ], batch size: 59, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:07:13,810 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-12 09:07:22,464 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 09:07:24,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1562260.0, ans=0.0 2024-08-12 09:07:49,574 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 09:07:54,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=22.5 2024-08-12 09:07:55,271 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 09:08:18,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1562660.0, ans=0.125 2024-08-12 09:08:27,141 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11350, loss[loss=0.1125, beats_loss=0.01055, ecapa_loss=0.0002176, whisper_loss=0.09981, over 19968.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001778, whisper_loss=0.09232, over 3895279.77 frames. ], batch size: 81, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:08:27,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1562760.0, ans=0.0 2024-08-12 09:08:59,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1562960.0, ans=0.125 2024-08-12 09:09:01,259 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 09:09:09,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2024-08-12 09:09:12,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1563060.0, ans=0.125 2024-08-12 09:09:32,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.664e+01 2.943e+01 3.527e+01 6.465e+01, threshold=5.886e+01, percent-clipped=3.0 2024-08-12 09:09:32,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1563160.0, ans=0.1 2024-08-12 09:09:43,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11400, loss[loss=0.07915, beats_loss=0.01103, ecapa_loss=0.000187, whisper_loss=0.06625, over 15335.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001781, whisper_loss=0.09217, over 3894048.08 frames. ], batch size: 62, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:09:47,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-12 09:10:00,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1563360.0, ans=0.125 2024-08-12 09:10:12,340 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 09:10:17,018 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 09:10:23,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1563460.0, ans=0.125 2024-08-12 09:10:33,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1563560.0, ans=10.0 2024-08-12 09:10:45,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=1563660.0, ans=15.0 2024-08-12 09:10:48,392 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-12 09:10:52,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1563660.0, ans=0.0 2024-08-12 09:10:54,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-12 09:10:58,672 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11450, loss[loss=0.09045, beats_loss=0.01369, ecapa_loss=0.000145, whisper_loss=0.07531, over 16970.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01104, ecapa_loss=0.0001776, whisper_loss=0.0919, over 3894850.19 frames. ], batch size: 69, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:11:03,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1563760.0, ans=0.2 2024-08-12 09:11:07,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1563760.0, ans=0.125 2024-08-12 09:11:18,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-08-12 09:11:59,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1564160.0, ans=0.125 2024-08-12 09:12:01,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.222e+01 2.691e+01 2.984e+01 3.648e+01 5.377e+01, threshold=5.967e+01, percent-clipped=0.0 2024-08-12 09:12:12,738 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11500, loss[loss=0.1301, beats_loss=0.008499, ecapa_loss=0.0001957, whisper_loss=0.1197, over 22128.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001785, whisper_loss=0.09251, over 3909165.13 frames. ], batch size: 85, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:12:18,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1564260.0, ans=0.2 2024-08-12 09:12:30,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1564360.0, ans=0.0 2024-08-12 09:12:31,114 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 09:12:41,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1564460.0, ans=0.0 2024-08-12 09:12:42,608 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 09:13:18,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-12 09:13:26,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11550, loss[loss=0.1373, beats_loss=0.0083, ecapa_loss=0.0001868, whisper_loss=0.1271, over 16706.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001797, whisper_loss=0.0924, over 3924521.56 frames. ], batch size: 61, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:13:32,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1564760.0, ans=0.125 2024-08-12 09:13:46,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1564860.0, ans=0.0 2024-08-12 09:13:57,479 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 09:14:30,977 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 09:14:32,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.539e+01 2.783e+01 3.251e+01 6.274e+01, threshold=5.566e+01, percent-clipped=2.0 2024-08-12 09:14:36,563 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 09:14:41,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11600, loss[loss=0.1067, beats_loss=0.01257, ecapa_loss=0.0001226, whisper_loss=0.09294, over 23203.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001776, whisper_loss=0.09197, over 3925939.40 frames. ], batch size: 89, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:14:50,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2024-08-12 09:14:58,194 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 09:15:02,483 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 09:15:11,722 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 09:15:28,089 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 18 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-12 09:15:52,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11650, loss[loss=0.12, beats_loss=0.009451, ecapa_loss=0.0001626, whisper_loss=0.1089, over 14676.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01099, ecapa_loss=0.0001792, whisper_loss=0.09246, over 3930970.29 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:15:56,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1565760.0, ans=0.2 2024-08-12 09:16:21,608 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 09:16:21,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1565960.0, ans=0.125 2024-08-12 09:16:33,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=15.0 2024-08-12 09:16:35,488 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 09:16:45,250 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 09:16:53,049 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.448e+01 2.832e+01 3.122e+01 7.544e+01, threshold=5.665e+01, percent-clipped=2.0 2024-08-12 09:17:03,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11700, loss[loss=0.1096, beats_loss=0.01256, ecapa_loss=0.0001524, whisper_loss=0.09548, over 14533.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0111, ecapa_loss=0.0001781, whisper_loss=0.09221, over 3915388.24 frames. ], batch size: 57, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:17:21,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1566360.0, ans=0.0 2024-08-12 09:17:29,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1566460.0, ans=0.0 2024-08-12 09:17:52,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1566560.0, ans=0.2 2024-08-12 09:17:59,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1566660.0, ans=0.125 2024-08-12 09:18:03,878 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 09:18:05,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.74 vs. limit=22.5 2024-08-12 09:18:11,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11750, loss[loss=0.09534, beats_loss=0.01096, ecapa_loss=0.0002571, whisper_loss=0.08182, over 21390.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01112, ecapa_loss=0.0001798, whisper_loss=0.09197, over 3908924.89 frames. ], batch size: 92, lr: 5.66e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:18:13,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1566760.0, ans=0.125 2024-08-12 09:18:15,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1566760.0, ans=0.04949747468305833 2024-08-12 09:18:23,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1566760.0, ans=0.0 2024-08-12 09:18:25,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1566860.0, ans=0.2 2024-08-12 09:18:32,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2024-08-12 09:18:35,740 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 09:18:39,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1566960.0, ans=0.05 2024-08-12 09:18:46,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.05 vs. limit=12.0 2024-08-12 09:18:48,912 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 33 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 09:19:03,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1567060.0, ans=0.125 2024-08-12 09:19:12,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.551e+01 2.829e+01 3.227e+01 5.711e+01, threshold=5.658e+01, percent-clipped=1.0 2024-08-12 09:19:22,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11800, loss[loss=0.1066, beats_loss=0.0116, ecapa_loss=0.0001611, whisper_loss=0.09338, over 19500.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01106, ecapa_loss=0.0001808, whisper_loss=0.09189, over 3892273.21 frames. ], batch size: 77, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:19:24,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1567260.0, ans=0.1 2024-08-12 09:19:46,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-08-12 09:19:52,019 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 09:19:56,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1567460.0, ans=0.125 2024-08-12 09:20:04,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1567560.0, ans=0.1 2024-08-12 09:20:04,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1567560.0, ans=0.1 2024-08-12 09:20:11,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1567560.0, ans=0.125 2024-08-12 09:20:31,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11850, loss[loss=0.09038, beats_loss=0.01274, ecapa_loss=0.0002043, whisper_loss=0.0756, over 17639.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01102, ecapa_loss=0.000181, whisper_loss=0.09237, over 3907395.44 frames. ], batch size: 72, lr: 5.65e-03, grad_scale: 1.152921504606847e+18 2024-08-12 09:20:37,338 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 09:20:41,317 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 09:21:01,723 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 09:21:25,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1568160.0, ans=0.1 2024-08-12 09:21:31,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.485e+01 2.770e+01 3.068e+01 4.213e+01, threshold=5.539e+01, percent-clipped=0.0 2024-08-12 09:21:39,560 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11900, loss[loss=0.1342, beats_loss=0.01031, ecapa_loss=0.0001996, whisper_loss=0.1219, over 17778.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0111, ecapa_loss=0.0001804, whisper_loss=0.09207, over 3921666.01 frames. ], batch size: 72, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:21:54,144 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 09:21:55,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1568360.0, ans=0.0 2024-08-12 09:22:08,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1568460.0, ans=0.125 2024-08-12 09:22:21,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-12 09:22:25,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1568560.0, ans=0.0 2024-08-12 09:22:27,921 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 09:22:43,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-12 09:22:49,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 11950, loss[loss=0.1225, beats_loss=0.01047, ecapa_loss=0.0001044, whisper_loss=0.1109, over 16373.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01099, ecapa_loss=0.0001802, whisper_loss=0.09224, over 3872277.34 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:22:54,387 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 09:23:10,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1568860.0, ans=0.125 2024-08-12 09:23:16,752 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 09:23:38,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1569060.0, ans=0.125 2024-08-12 09:23:51,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-12 09:23:51,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.558e+01 2.859e+01 3.291e+01 5.466e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 09:23:56,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-12 09:23:59,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1569260.0, ans=0.125 2024-08-12 09:24:00,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12000, loss[loss=0.1053, beats_loss=0.008354, ecapa_loss=0.000186, whisper_loss=0.0951, over 15181.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001798, whisper_loss=0.09215, over 3877103.31 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:24:00,142 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 09:24:39,958 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0006057, whisper_loss=0.2491, over 922467.00 frames. 2024-08-12 09:24:56,773 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on SV_voxceleb1: loss=0.004842, beats_loss=0, ecapa_loss=0.0004842, whisper_loss=0, over 939242.00 frames. 2024-08-12 09:26:51,032 INFO [train_multi_KD3.py:1149] (0/4) Epoch 11, validation on AT_audioset: loss=0.02454, beats_loss=0.02454, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 09:26:51,036 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 09:26:53,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2024-08-12 09:26:55,559 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 09:26:56,987 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 09:26:58,355 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 09:26:58,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-08-12 09:27:14,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1569360.0, ans=0.125 2024-08-12 09:27:25,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1569460.0, ans=0.2 2024-08-12 09:27:52,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1569660.0, ans=0.0 2024-08-12 09:27:55,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1569660.0, ans=0.125 2024-08-12 09:28:01,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12050, loss[loss=0.1279, beats_loss=0.006843, ecapa_loss=0.0001979, whisper_loss=0.1191, over 17359.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001788, whisper_loss=0.0922, over 3863324.06 frames. ], batch size: 62, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:28:06,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1569760.0, ans=0.125 2024-08-12 09:28:07,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1569760.0, ans=0.0 2024-08-12 09:28:51,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1570060.0, ans=0.0 2024-08-12 09:29:03,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.535e+01 2.943e+01 3.446e+01 4.689e+01, threshold=5.887e+01, percent-clipped=0.0 2024-08-12 09:29:12,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12100, loss[loss=0.09754, beats_loss=0.01069, ecapa_loss=0.0001772, whisper_loss=0.08509, over 17768.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01108, ecapa_loss=0.0001789, whisper_loss=0.09225, over 3860370.77 frames. ], batch size: 73, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:29:15,056 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 09:30:22,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12150, loss[loss=0.1089, beats_loss=0.01029, ecapa_loss=0.0002122, whisper_loss=0.09647, over 17687.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01102, ecapa_loss=0.0001799, whisper_loss=0.09252, over 3888291.63 frames. ], batch size: 71, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:30:56,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-12 09:31:03,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1570960.0, ans=0.04949747468305833 2024-08-12 09:31:13,836 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 09:31:22,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1571160.0, ans=0.125 2024-08-12 09:31:23,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1571160.0, ans=0.0 2024-08-12 09:31:25,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.520e+01 2.822e+01 3.048e+01 5.048e+01, threshold=5.643e+01, percent-clipped=0.0 2024-08-12 09:31:34,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.08 vs. limit=6.0 2024-08-12 09:31:34,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.53 vs. limit=22.5 2024-08-12 09:31:34,919 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12200, loss[loss=0.1051, beats_loss=0.01277, ecapa_loss=0.0001377, whisper_loss=0.09091, over 16976.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01105, ecapa_loss=0.0001786, whisper_loss=0.09205, over 3870504.04 frames. ], batch size: 66, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:31:59,454 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 09:32:01,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1571360.0, ans=0.125 2024-08-12 09:32:05,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1571460.0, ans=0.0 2024-08-12 09:32:06,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1571460.0, ans=0.125 2024-08-12 09:32:41,204 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 09:32:42,450 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 13 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 09:32:47,416 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12250, loss[loss=0.07487, beats_loss=0.01162, ecapa_loss=0.0001914, whisper_loss=0.06134, over 17569.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01111, ecapa_loss=0.0001777, whisper_loss=0.09187, over 3868257.07 frames. ], batch size: 72, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:32:50,444 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 09:32:52,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1571760.0, ans=0.1 2024-08-12 09:33:00,474 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 09:33:20,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.97 vs. limit=22.5 2024-08-12 09:33:21,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1571960.0, ans=0.125 2024-08-12 09:33:28,596 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 09:33:33,716 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-12 09:33:50,828 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 09:33:51,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.601e+01 2.927e+01 3.328e+01 4.694e+01, threshold=5.855e+01, percent-clipped=0.0 2024-08-12 09:33:55,024 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 09:34:00,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12300, loss[loss=0.1142, beats_loss=0.01126, ecapa_loss=0.0001678, whisper_loss=0.1013, over 21685.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01107, ecapa_loss=0.0001791, whisper_loss=0.092, over 3863957.71 frames. ], batch size: 91, lr: 5.65e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:34:18,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.90 vs. limit=22.5 2024-08-12 09:34:20,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1572360.0, ans=0.025 2024-08-12 09:34:23,382 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 09:34:33,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1572460.0, ans=10.0 2024-08-12 09:34:34,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1572460.0, ans=0.125 2024-08-12 09:34:45,032 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 09:35:06,843 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 09:35:11,008 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 09:35:12,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12350, loss[loss=0.1043, beats_loss=0.011, ecapa_loss=0.0001379, whisper_loss=0.09193, over 21279.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001797, whisper_loss=0.09169, over 3836319.39 frames. ], batch size: 82, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:35:12,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1572760.0, ans=0.5 2024-08-12 09:35:22,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.25 vs. limit=15.0 2024-08-12 09:35:33,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2024-08-12 09:35:55,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1573060.0, ans=0.125 2024-08-12 09:35:56,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1573060.0, ans=0.125 2024-08-12 09:36:08,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1573060.0, ans=0.1 2024-08-12 09:36:16,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.617e+01 3.064e+01 3.584e+01 5.581e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 09:36:25,430 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12400, loss[loss=0.1111, beats_loss=0.01123, ecapa_loss=0.0001781, whisper_loss=0.09811, over 21995.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01108, ecapa_loss=0.0001783, whisper_loss=0.09161, over 3878204.54 frames. ], batch size: 90, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:36:40,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-12 09:36:49,475 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 09:36:56,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1573460.0, ans=0.0 2024-08-12 09:36:56,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-12 09:37:04,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1573460.0, ans=0.2 2024-08-12 09:37:05,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1573460.0, ans=0.0 2024-08-12 09:37:08,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1573560.0, ans=0.125 2024-08-12 09:37:14,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1573560.0, ans=0.125 2024-08-12 09:37:20,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1573560.0, ans=0.125 2024-08-12 09:37:38,182 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12450, loss[loss=0.114, beats_loss=0.009541, ecapa_loss=0.0001713, whisper_loss=0.1028, over 17629.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01112, ecapa_loss=0.0001768, whisper_loss=0.09147, over 3877221.21 frames. ], batch size: 67, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:37:46,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-08-12 09:37:49,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.23 vs. limit=10.0 2024-08-12 09:37:54,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1573860.0, ans=0.125 2024-08-12 09:37:55,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1573860.0, ans=0.125 2024-08-12 09:38:24,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1574060.0, ans=0.0 2024-08-12 09:38:25,501 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 09:38:27,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1574060.0, ans=0.0 2024-08-12 09:38:36,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=22.5 2024-08-12 09:38:39,844 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 09:38:40,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.466e+01 2.753e+01 3.048e+01 4.353e+01, threshold=5.506e+01, percent-clipped=0.0 2024-08-12 09:38:41,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1574160.0, ans=0.1 2024-08-12 09:38:49,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12500, loss[loss=0.1111, beats_loss=0.01085, ecapa_loss=0.0002167, whisper_loss=0.09812, over 14353.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001773, whisper_loss=0.09154, over 3871839.95 frames. ], batch size: 60, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:39:03,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2024-08-12 09:39:19,319 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 20 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 09:39:21,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-12 09:39:22,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1574460.0, ans=0.1 2024-08-12 09:39:42,110 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:39:52,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1574660.0, ans=0.125 2024-08-12 09:39:59,010 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12550, loss[loss=0.08417, beats_loss=0.0121, ecapa_loss=0.0001389, whisper_loss=0.07069, over 14668.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001766, whisper_loss=0.09227, over 3884724.99 frames. ], batch size: 56, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:40:11,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1574760.0, ans=0.125 2024-08-12 09:40:20,579 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 09:40:21,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-12 09:40:40,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1575060.0, ans=0.125 2024-08-12 09:40:41,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1575060.0, ans=0.1 2024-08-12 09:40:46,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1575060.0, ans=0.07 2024-08-12 09:40:56,420 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 12 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 09:41:01,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.516e+01 2.754e+01 3.207e+01 3.892e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 09:41:01,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1575160.0, ans=0.0 2024-08-12 09:41:10,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12600, loss[loss=0.08563, beats_loss=0.01113, ecapa_loss=0.0002775, whisper_loss=0.07172, over 12705.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01094, ecapa_loss=0.0001774, whisper_loss=0.0932, over 3868013.84 frames. ], batch size: 55, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:41:12,155 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 09:41:15,719 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-12 09:41:26,738 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 09:41:40,990 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 09:41:44,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1575460.0, ans=0.04949747468305833 2024-08-12 09:42:02,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1575560.0, ans=0.125 2024-08-12 09:42:12,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2024-08-12 09:42:20,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12650, loss[loss=0.1065, beats_loss=0.009938, ecapa_loss=0.000202, whisper_loss=0.09457, over 17021.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01095, ecapa_loss=0.0001788, whisper_loss=0.09336, over 3880287.53 frames. ], batch size: 69, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:42:22,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1575760.0, ans=0.5 2024-08-12 09:42:38,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1575860.0, ans=0.0 2024-08-12 09:42:57,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1575960.0, ans=0.125 2024-08-12 09:43:01,584 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 09:43:16,660 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 09:43:19,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1576160.0, ans=0.125 2024-08-12 09:43:22,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.519e+01 2.747e+01 3.019e+01 4.514e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 09:43:30,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12700, loss[loss=0.1284, beats_loss=0.0083, ecapa_loss=0.000174, whisper_loss=0.1183, over 20038.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01097, ecapa_loss=0.000179, whisper_loss=0.09336, over 3891697.61 frames. ], batch size: 74, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:43:32,366 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 09:43:33,650 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 09:44:06,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1576460.0, ans=0.05 2024-08-12 09:44:12,222 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 09:44:14,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1576560.0, ans=0.0 2024-08-12 09:44:15,347 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 09:44:15,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1576560.0, ans=0.2 2024-08-12 09:44:20,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1576560.0, ans=0.125 2024-08-12 09:44:21,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2024-08-12 09:44:37,629 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 09:44:41,541 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12750, loss[loss=0.09452, beats_loss=0.01304, ecapa_loss=0.000136, whisper_loss=0.08012, over 16841.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01109, ecapa_loss=0.000178, whisper_loss=0.09246, over 3877229.77 frames. ], batch size: 64, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:44:42,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-12 09:44:57,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1576860.0, ans=0.1 2024-08-12 09:45:00,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1576860.0, ans=0.125 2024-08-12 09:45:04,201 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 09:45:08,772 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 09:45:19,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1576960.0, ans=0.125 2024-08-12 09:45:39,185 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 09:45:42,020 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 09:45:43,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.827e+01 3.190e+01 5.112e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 09:45:47,733 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 09:45:51,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12800, loss[loss=0.1007, beats_loss=0.01328, ecapa_loss=0.0001558, whisper_loss=0.08585, over 22210.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01111, ecapa_loss=0.0001807, whisper_loss=0.09209, over 3859955.41 frames. ], batch size: 91, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:46:09,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1577360.0, ans=0.0 2024-08-12 09:46:23,479 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 09:46:44,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1577560.0, ans=0.0 2024-08-12 09:47:02,423 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12850, loss[loss=0.11, beats_loss=0.012, ecapa_loss=0.000168, whisper_loss=0.09631, over 16517.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01115, ecapa_loss=0.0001786, whisper_loss=0.09205, over 3854611.68 frames. ], batch size: 64, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:47:03,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-08-12 09:47:08,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1577760.0, ans=0.125 2024-08-12 09:47:10,873 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 09:47:11,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1577760.0, ans=0.125 2024-08-12 09:47:13,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1577760.0, ans=0.0 2024-08-12 09:47:22,649 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 09:47:24,093 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 09:47:24,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1577860.0, ans=0.125 2024-08-12 09:47:37,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1577960.0, ans=0.0 2024-08-12 09:48:00,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1578160.0, ans=0.2 2024-08-12 09:48:03,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1578160.0, ans=0.125 2024-08-12 09:48:04,279 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.513e+01 2.797e+01 3.147e+01 4.860e+01, threshold=5.595e+01, percent-clipped=0.0 2024-08-12 09:48:12,718 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12900, loss[loss=0.08832, beats_loss=0.01142, ecapa_loss=0.0001536, whisper_loss=0.07536, over 18745.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01107, ecapa_loss=0.0001796, whisper_loss=0.09143, over 3796348.67 frames. ], batch size: 73, lr: 5.64e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:48:24,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1578260.0, ans=0.09899494936611666 2024-08-12 09:48:28,121 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 09:48:29,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1578360.0, ans=0.0 2024-08-12 09:48:49,305 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 09:48:53,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1578560.0, ans=0.2 2024-08-12 09:49:00,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1578560.0, ans=0.2 2024-08-12 09:49:21,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 12950, loss[loss=0.09054, beats_loss=0.01429, ecapa_loss=0.0001411, whisper_loss=0.07483, over 18557.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001809, whisper_loss=0.09159, over 3821888.55 frames. ], batch size: 77, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:49:25,376 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 34 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 09:49:40,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1578860.0, ans=0.125 2024-08-12 09:49:48,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1578860.0, ans=0.0 2024-08-12 09:50:14,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1579060.0, ans=0.1 2024-08-12 09:50:15,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1579060.0, ans=0.125 2024-08-12 09:50:24,601 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.602e+01 2.996e+01 3.291e+01 5.195e+01, threshold=5.992e+01, percent-clipped=0.0 2024-08-12 09:50:33,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13000, loss[loss=0.1155, beats_loss=0.01058, ecapa_loss=0.0001861, whisper_loss=0.1031, over 21063.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01105, ecapa_loss=0.0001795, whisper_loss=0.09159, over 3857685.70 frames. ], batch size: 85, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:50:35,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1579260.0, ans=0.125 2024-08-12 09:50:38,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-12 09:50:56,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1579360.0, ans=0.05 2024-08-12 09:50:59,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1579360.0, ans=0.035 2024-08-12 09:51:15,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-12 09:51:26,343 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 19 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-12 09:51:26,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1579560.0, ans=0.0 2024-08-12 09:51:43,505 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 09:51:44,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13050, loss[loss=0.1123, beats_loss=0.009309, ecapa_loss=0.0001667, whisper_loss=0.1013, over 23696.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01101, ecapa_loss=0.0001796, whisper_loss=0.0915, over 3863249.96 frames. ], batch size: 88, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:51:46,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1579760.0, ans=0.125 2024-08-12 09:52:02,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1579860.0, ans=0.1 2024-08-12 09:52:06,037 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 09:52:13,093 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 09:52:14,217 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 09:52:46,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.449e+01 2.683e+01 3.089e+01 1.742e+02, threshold=5.367e+01, percent-clipped=1.0 2024-08-12 09:52:54,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13100, loss[loss=0.1276, beats_loss=0.01155, ecapa_loss=0.0001345, whisper_loss=0.1147, over 24090.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01104, ecapa_loss=0.0001794, whisper_loss=0.09149, over 3859662.05 frames. ], batch size: 88, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:52:59,350 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 09:53:22,111 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.971e-02 2024-08-12 09:53:34,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1580460.0, ans=0.125 2024-08-12 09:53:34,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1580460.0, ans=0.125 2024-08-12 09:53:40,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1580560.0, ans=0.125 2024-08-12 09:53:45,994 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 09:53:51,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-12 09:53:57,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2024-08-12 09:53:58,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1580660.0, ans=0.125 2024-08-12 09:54:05,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13150, loss[loss=0.1198, beats_loss=0.009664, ecapa_loss=0.0001783, whisper_loss=0.1084, over 22452.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001795, whisper_loss=0.09166, over 3856419.86 frames. ], batch size: 86, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:54:15,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1580760.0, ans=0.125 2024-08-12 09:54:15,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1580760.0, ans=0.125 2024-08-12 09:54:17,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-08-12 09:54:23,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1580860.0, ans=0.0 2024-08-12 09:54:25,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1580860.0, ans=0.0 2024-08-12 09:54:30,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1580860.0, ans=0.07 2024-08-12 09:54:31,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1580860.0, ans=0.1 2024-08-12 09:54:41,322 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 09:54:44,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1580960.0, ans=0.125 2024-08-12 09:54:56,635 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 09:55:07,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.633e+01 2.862e+01 3.411e+01 5.758e+01, threshold=5.724e+01, percent-clipped=1.0 2024-08-12 09:55:16,686 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13200, loss[loss=0.1169, beats_loss=0.01123, ecapa_loss=0.0001931, whisper_loss=0.1037, over 22955.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01091, ecapa_loss=0.0001803, whisper_loss=0.09209, over 3848250.73 frames. ], batch size: 92, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:55:26,559 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 34 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 09:55:31,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1581360.0, ans=0.125 2024-08-12 09:55:32,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1581360.0, ans=0.1 2024-08-12 09:55:41,322 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 09:55:52,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1581460.0, ans=0.125 2024-08-12 09:55:52,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1581460.0, ans=0.2 2024-08-12 09:56:15,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1581660.0, ans=0.2 2024-08-12 09:56:25,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1581660.0, ans=0.07 2024-08-12 09:56:27,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13250, loss[loss=0.1062, beats_loss=0.01044, ecapa_loss=0.0001913, whisper_loss=0.09383, over 15476.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01091, ecapa_loss=0.0001797, whisper_loss=0.09216, over 3855545.29 frames. ], batch size: 62, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:56:34,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1581760.0, ans=0.125 2024-08-12 09:56:40,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1581760.0, ans=0.125 2024-08-12 09:56:44,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1581860.0, ans=0.125 2024-08-12 09:56:48,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1581860.0, ans=0.125 2024-08-12 09:56:51,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1581860.0, ans=0.125 2024-08-12 09:57:00,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1581960.0, ans=0.0 2024-08-12 09:57:01,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1581960.0, ans=0.125 2024-08-12 09:57:17,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1582060.0, ans=0.2 2024-08-12 09:57:20,801 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-12 09:57:22,177 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 09:57:25,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1582160.0, ans=0.125 2024-08-12 09:57:30,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.187e+01 2.621e+01 2.894e+01 3.453e+01 5.158e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-12 09:57:38,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1582260.0, ans=0.125 2024-08-12 09:57:38,978 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13300, loss[loss=0.1227, beats_loss=0.009758, ecapa_loss=0.0001936, whisper_loss=0.111, over 21665.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01091, ecapa_loss=0.0001788, whisper_loss=0.09277, over 3855721.26 frames. ], batch size: 87, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:57:46,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-12 09:57:57,236 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 09:58:05,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1582460.0, ans=0.125 2024-08-12 09:58:08,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1582460.0, ans=0.125 2024-08-12 09:58:10,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1582460.0, ans=0.2 2024-08-12 09:58:31,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2024-08-12 09:58:42,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1582660.0, ans=0.125 2024-08-12 09:58:43,904 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 09:58:49,836 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13350, loss[loss=0.09699, beats_loss=0.01269, ecapa_loss=0.0001712, whisper_loss=0.08259, over 21095.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01103, ecapa_loss=0.0001773, whisper_loss=0.09197, over 3865615.02 frames. ], batch size: 86, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 09:58:52,846 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-12 09:59:08,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1582860.0, ans=0.05 2024-08-12 09:59:14,343 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 09:59:20,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1582960.0, ans=0.0 2024-08-12 09:59:24,094 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 09:59:29,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-08-12 09:59:29,871 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 09:59:35,360 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 09:59:36,664 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 09:59:38,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1583060.0, ans=0.0 2024-08-12 09:59:43,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1583060.0, ans=0.125 2024-08-12 09:59:46,354 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 09:59:46,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1583160.0, ans=0.125 2024-08-12 09:59:47,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1583160.0, ans=0.125 2024-08-12 09:59:51,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.599e+01 2.960e+01 3.368e+01 5.094e+01, threshold=5.919e+01, percent-clipped=0.0 2024-08-12 09:59:57,484 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 27 from LS+wenet, 15 from Vox, 14 fro AS 2024-08-12 10:00:00,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13400, loss[loss=0.09679, beats_loss=0.0106, ecapa_loss=0.0002059, whisper_loss=0.08412, over 16929.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01101, ecapa_loss=0.0001776, whisper_loss=0.09233, over 3860936.35 frames. ], batch size: 68, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:00:07,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1583260.0, ans=0.0 2024-08-12 10:00:10,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1583260.0, ans=0.125 2024-08-12 10:00:10,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1583260.0, ans=0.125 2024-08-12 10:00:27,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1583460.0, ans=0.125 2024-08-12 10:00:30,226 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 10:00:30,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1583460.0, ans=0.125 2024-08-12 10:00:31,475 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 10:00:41,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1583560.0, ans=0.125 2024-08-12 10:00:53,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1583560.0, ans=0.125 2024-08-12 10:00:58,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1583660.0, ans=0.2 2024-08-12 10:01:04,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-12 10:01:05,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1583660.0, ans=0.125 2024-08-12 10:01:08,028 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.870e-01 2024-08-12 10:01:10,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13450, loss[loss=0.1204, beats_loss=0.009869, ecapa_loss=0.0001715, whisper_loss=0.1089, over 18232.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001783, whisper_loss=0.09209, over 3858586.98 frames. ], batch size: 68, lr: 5.63e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:01:10,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1583760.0, ans=0.035 2024-08-12 10:01:22,911 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 10:01:27,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-12 10:01:33,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1583860.0, ans=0.0 2024-08-12 10:01:41,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1583960.0, ans=0.0 2024-08-12 10:01:45,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1583960.0, ans=0.0 2024-08-12 10:01:45,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.25 vs. limit=22.5 2024-08-12 10:01:51,536 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 10:02:09,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-12 10:02:11,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.426e+01 2.699e+01 3.096e+01 4.776e+01, threshold=5.398e+01, percent-clipped=0.0 2024-08-12 10:02:18,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1584260.0, ans=0.2 2024-08-12 10:02:19,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13500, loss[loss=0.1061, beats_loss=0.01422, ecapa_loss=0.0001277, whisper_loss=0.09061, over 23407.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.000179, whisper_loss=0.09184, over 3861510.25 frames. ], batch size: 91, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:02:20,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1584260.0, ans=0.125 2024-08-12 10:02:34,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1584360.0, ans=0.0 2024-08-12 10:02:36,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2024-08-12 10:02:44,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=15.0 2024-08-12 10:03:11,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1584560.0, ans=0.125 2024-08-12 10:03:15,366 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 10:03:16,734 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 10:03:24,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-08-12 10:03:30,126 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13550, loss[loss=0.09578, beats_loss=0.01076, ecapa_loss=0.0002329, whisper_loss=0.0827, over 18946.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01104, ecapa_loss=0.000179, whisper_loss=0.09193, over 3854456.67 frames. ], batch size: 80, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:03:32,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1584760.0, ans=0.125 2024-08-12 10:03:54,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2024-08-12 10:03:56,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1584860.0, ans=0.0 2024-08-12 10:04:14,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1585060.0, ans=0.125 2024-08-12 10:04:26,390 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 10:04:31,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.170e+01 2.628e+01 2.875e+01 3.352e+01 5.913e+01, threshold=5.750e+01, percent-clipped=1.0 2024-08-12 10:04:40,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13600, loss[loss=0.09206, beats_loss=0.01331, ecapa_loss=0.0001431, whisper_loss=0.07731, over 23346.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01112, ecapa_loss=0.0001788, whisper_loss=0.09161, over 3882698.40 frames. ], batch size: 93, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:04:47,203 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 19 from Vox, 53 fro AS 2024-08-12 10:04:51,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1585260.0, ans=0.125 2024-08-12 10:04:52,743 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-12 10:04:59,354 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 10:05:04,931 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-12 10:05:07,954 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:05:12,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1585460.0, ans=0.125 2024-08-12 10:05:48,741 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13650, loss[loss=0.1099, beats_loss=0.009603, ecapa_loss=0.000222, whisper_loss=0.09807, over 17710.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01112, ecapa_loss=0.0001797, whisper_loss=0.09185, over 3879031.78 frames. ], batch size: 72, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:06:06,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1585860.0, ans=0.2 2024-08-12 10:06:07,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1585860.0, ans=0.0 2024-08-12 10:06:08,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1585860.0, ans=0.0 2024-08-12 10:06:31,001 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 10:06:38,231 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 10:06:50,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.547e+01 2.720e+01 3.156e+01 5.627e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 10:06:58,132 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 33 from Vox, 40 fro AS 2024-08-12 10:06:59,288 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13700, loss[loss=0.07345, beats_loss=0.01232, ecapa_loss=0.000218, whisper_loss=0.05895, over 21177.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.00018, whisper_loss=0.09184, over 3898024.26 frames. ], batch size: 93, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:07:28,361 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 10:07:48,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1586560.0, ans=0.0 2024-08-12 10:07:57,282 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-12 10:08:09,388 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13750, loss[loss=0.07874, beats_loss=0.01242, ecapa_loss=0.0001724, whisper_loss=0.0646, over 21011.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01104, ecapa_loss=0.0001796, whisper_loss=0.09228, over 3902338.43 frames. ], batch size: 85, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:08:18,144 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 10:08:23,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2024-08-12 10:08:25,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-12 10:08:26,838 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 10:08:32,640 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.483e-01 2024-08-12 10:08:35,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-08-12 10:08:42,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=12.0 2024-08-12 10:08:45,028 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 10:08:45,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1586960.0, ans=0.125 2024-08-12 10:09:00,635 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 10:09:08,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1587160.0, ans=0.1 2024-08-12 10:09:11,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.427e+01 2.784e+01 3.131e+01 5.573e+01, threshold=5.568e+01, percent-clipped=1.0 2024-08-12 10:09:12,041 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 9 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 10:09:16,280 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 10:09:20,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13800, loss[loss=0.1149, beats_loss=0.01006, ecapa_loss=0.0001726, whisper_loss=0.1031, over 22235.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001796, whisper_loss=0.09194, over 3902126.05 frames. ], batch size: 90, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:09:51,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1587460.0, ans=0.2 2024-08-12 10:09:58,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1587460.0, ans=0.125 2024-08-12 10:10:02,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1587560.0, ans=0.2 2024-08-12 10:10:12,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1587560.0, ans=0.0 2024-08-12 10:10:20,133 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 10:10:29,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-08-12 10:10:32,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13850, loss[loss=0.1199, beats_loss=0.008814, ecapa_loss=0.0002163, whisper_loss=0.109, over 16570.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01096, ecapa_loss=0.0001793, whisper_loss=0.09249, over 3914123.65 frames. ], batch size: 64, lr: 5.62e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:11:01,147 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 10:11:01,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1587960.0, ans=0.125 2024-08-12 10:11:08,446 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 10:11:28,285 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 10:11:35,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.570e+01 2.844e+01 3.264e+01 2.322e+02, threshold=5.688e+01, percent-clipped=2.0 2024-08-12 10:11:42,891 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-12 10:11:44,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13900, loss[loss=0.1024, beats_loss=0.01314, ecapa_loss=0.0001446, whisper_loss=0.08777, over 17608.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01099, ecapa_loss=0.0001799, whisper_loss=0.0926, over 3904008.25 frames. ], batch size: 69, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:11:46,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.29 vs. limit=10.0 2024-08-12 10:11:55,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1588260.0, ans=0.2 2024-08-12 10:12:03,728 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 10:12:03,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1588360.0, ans=0.125 2024-08-12 10:12:22,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2024-08-12 10:12:31,157 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 10:13:00,949 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 13950, loss[loss=0.1043, beats_loss=0.01214, ecapa_loss=0.0001763, whisper_loss=0.09042, over 22855.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.011, ecapa_loss=0.0001791, whisper_loss=0.09265, over 3906570.83 frames. ], batch size: 92, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:13:11,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1588760.0, ans=0.2 2024-08-12 10:13:16,331 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 10:13:36,562 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 10:13:59,904 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 10:14:14,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.447e+01 2.683e+01 3.149e+01 1.029e+02, threshold=5.366e+01, percent-clipped=1.0 2024-08-12 10:14:18,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2024-08-12 10:14:23,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14000, loss[loss=0.09598, beats_loss=0.01303, ecapa_loss=0.0001532, whisper_loss=0.08142, over 19626.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01102, ecapa_loss=0.0001768, whisper_loss=0.0928, over 3908544.98 frames. ], batch size: 76, lr: 5.62e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:14:24,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1589260.0, ans=0.2 2024-08-12 10:14:29,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1589260.0, ans=0.125 2024-08-12 10:14:41,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-12 10:14:51,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1589360.0, ans=0.125 2024-08-12 10:14:51,537 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.468e+01 2024-08-12 10:15:00,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1589460.0, ans=0.1 2024-08-12 10:15:00,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.70 vs. limit=15.0 2024-08-12 10:15:03,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.03 vs. limit=6.0 2024-08-12 10:15:34,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1589660.0, ans=0.2 2024-08-12 10:15:41,958 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14050, loss[loss=0.1155, beats_loss=0.01021, ecapa_loss=0.0002017, whisper_loss=0.1033, over 22259.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01101, ecapa_loss=0.0001762, whisper_loss=0.09305, over 3893004.32 frames. ], batch size: 90, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:16:01,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1589860.0, ans=0.1 2024-08-12 10:16:02,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1589860.0, ans=0.2 2024-08-12 10:16:31,414 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-12 10:16:46,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-12 10:16:51,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1590160.0, ans=0.1 2024-08-12 10:16:54,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.562e+01 2.972e+01 3.503e+01 4.652e+01, threshold=5.944e+01, percent-clipped=0.0 2024-08-12 10:16:54,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1590160.0, ans=0.125 2024-08-12 10:17:02,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1590260.0, ans=0.2 2024-08-12 10:17:03,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14100, loss[loss=0.1082, beats_loss=0.008433, ecapa_loss=0.0002465, whisper_loss=0.09731, over 17227.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01104, ecapa_loss=0.0001752, whisper_loss=0.09331, over 3895422.52 frames. ], batch size: 68, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:17:07,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1590260.0, ans=0.1 2024-08-12 10:17:31,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1590360.0, ans=0.125 2024-08-12 10:17:32,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1590360.0, ans=0.125 2024-08-12 10:17:33,300 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 10:17:37,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1590460.0, ans=0.125 2024-08-12 10:17:39,593 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 10:17:47,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1590460.0, ans=0.125 2024-08-12 10:18:12,879 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 10:18:16,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1590660.0, ans=0.125 2024-08-12 10:18:23,567 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14150, loss[loss=0.1332, beats_loss=0.007914, ecapa_loss=0.000195, whisper_loss=0.1234, over 22513.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01103, ecapa_loss=0.0001774, whisper_loss=0.09326, over 3913888.92 frames. ], batch size: 91, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:18:28,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1590760.0, ans=0.0 2024-08-12 10:18:32,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1590760.0, ans=0.2 2024-08-12 10:18:40,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1590860.0, ans=0.0 2024-08-12 10:18:45,591 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 10:18:48,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1590860.0, ans=0.5 2024-08-12 10:18:59,963 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 10:19:01,746 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 10:19:22,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1591060.0, ans=0.0 2024-08-12 10:19:39,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.532e+01 2.801e+01 3.352e+01 7.282e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 10:19:41,193 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 10:19:49,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14200, loss[loss=0.09407, beats_loss=0.01434, ecapa_loss=0.0001763, whisper_loss=0.07797, over 16704.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01107, ecapa_loss=0.0001773, whisper_loss=0.09299, over 3896834.73 frames. ], batch size: 70, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:20:05,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1591360.0, ans=0.125 2024-08-12 10:20:48,041 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 10:21:11,671 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14250, loss[loss=0.07863, beats_loss=0.01311, ecapa_loss=0.0002163, whisper_loss=0.06335, over 13077.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01102, ecapa_loss=0.0001766, whisper_loss=0.0927, over 3898519.09 frames. ], batch size: 56, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:21:15,205 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 10:21:20,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1591760.0, ans=0.125 2024-08-12 10:21:21,256 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 38 from Vox, 27 fro AS 2024-08-12 10:21:26,793 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 10:21:27,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1591860.0, ans=0.125 2024-08-12 10:21:27,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-12 10:22:14,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1592160.0, ans=0.0 2024-08-12 10:22:21,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1592160.0, ans=0.09899494936611666 2024-08-12 10:22:22,522 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.447e+01 2.773e+01 3.183e+01 5.230e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-12 10:22:26,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1592160.0, ans=0.1 2024-08-12 10:22:28,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2024-08-12 10:22:33,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14300, loss[loss=0.1329, beats_loss=0.009494, ecapa_loss=0.0001812, whisper_loss=0.1216, over 23713.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01098, ecapa_loss=0.0001772, whisper_loss=0.09338, over 3897108.39 frames. ], batch size: 94, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:22:36,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1592260.0, ans=0.0 2024-08-12 10:22:38,067 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 10:22:38,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1592260.0, ans=0.125 2024-08-12 10:22:40,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1592260.0, ans=0.1 2024-08-12 10:23:12,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1592460.0, ans=0.0 2024-08-12 10:23:14,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1592460.0, ans=0.1 2024-08-12 10:23:19,744 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 10:23:32,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1592560.0, ans=0.0 2024-08-12 10:23:42,673 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 10:23:55,902 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14350, loss[loss=0.1104, beats_loss=0.01112, ecapa_loss=0.000172, whisper_loss=0.09751, over 21100.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01107, ecapa_loss=0.000176, whisper_loss=0.09283, over 3905865.80 frames. ], batch size: 86, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:23:56,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1592760.0, ans=0.04949747468305833 2024-08-12 10:24:09,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1592760.0, ans=0.125 2024-08-12 10:24:36,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1592960.0, ans=0.125 2024-08-12 10:24:44,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-12 10:24:48,938 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.358e-02 2024-08-12 10:24:50,323 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 10:24:57,607 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 10:25:00,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.480e+01 2.799e+01 3.080e+01 4.714e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 10:25:01,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1593160.0, ans=0.125 2024-08-12 10:25:05,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1593160.0, ans=0.125 2024-08-12 10:25:08,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14400, loss[loss=0.1109, beats_loss=0.01098, ecapa_loss=0.000181, whisper_loss=0.09811, over 22698.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01107, ecapa_loss=0.000178, whisper_loss=0.09289, over 3906440.67 frames. ], batch size: 89, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:25:24,901 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 10:25:38,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1593460.0, ans=0.125 2024-08-12 10:25:41,612 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 10:25:52,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1593560.0, ans=0.125 2024-08-12 10:25:54,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1593560.0, ans=0.125 2024-08-12 10:26:13,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1593660.0, ans=0.125 2024-08-12 10:26:19,168 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 10:26:19,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1593660.0, ans=0.125 2024-08-12 10:26:21,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 11, batch 14450, loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.000207, whisper_loss=0.08966, over 20499.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0111, ecapa_loss=0.0001783, whisper_loss=0.09247, over 3878929.06 frames. ], batch size: 86, lr: 5.61e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:26:29,111 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 10:26:39,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1593860.0, ans=0.125 2024-08-12 10:26:46,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1593860.0, ans=0.0 2024-08-12 10:26:46,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1593860.0, ans=0.025 2024-08-12 10:26:48,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1593860.0, ans=0.025 2024-08-12 10:27:14,148 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-11.pt 2024-08-12 10:27:48,680 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 0, loss[loss=0.104, beats_loss=0.01099, ecapa_loss=0.0002003, whisper_loss=0.09101, over 21808.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01099, ecapa_loss=0.0002003, whisper_loss=0.09101, over 21808.00 frames. ], batch size: 91, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:27:48,682 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 10:27:58,444 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8614, 2.8409, 3.1134, 3.3674], device='cuda:0') 2024-08-12 10:28:20,700 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4942, 2.5163, 2.9032, 1.8806], device='cuda:0') 2024-08-12 10:28:26,833 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on ASR_libri: loss=0.2553, beats_loss=0, ecapa_loss=0.0005949, whisper_loss=0.2493, over 922467.00 frames. 2024-08-12 10:28:43,376 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on SV_voxceleb1: loss=0.004912, beats_loss=0, ecapa_loss=0.0004912, whisper_loss=0, over 939242.00 frames. 2024-08-12 10:30:37,354 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7587, 3.6903, 3.7543, 3.1342], device='cuda:0') 2024-08-12 10:30:40,435 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on AT_audioset: loss=0.02433, beats_loss=0.02433, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 10:30:40,440 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 10:30:40,597 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 10:30:56,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1594110.0, ans=0.5 2024-08-12 10:30:59,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.491e+01 2.893e+01 3.197e+01 9.364e+01, threshold=5.786e+01, percent-clipped=1.0 2024-08-12 10:31:01,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1594210.0, ans=0.1 2024-08-12 10:31:36,064 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 10:31:52,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-08-12 10:31:54,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-12 10:32:04,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1594510.0, ans=0.1 2024-08-12 10:32:09,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1594510.0, ans=0.125 2024-08-12 10:32:09,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-12 10:32:11,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1594510.0, ans=0.125 2024-08-12 10:32:24,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 50, loss[loss=0.1125, beats_loss=0.009516, ecapa_loss=0.0001783, whisper_loss=0.1012, over 21297.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001811, whisper_loss=0.09002, over 890980.45 frames. ], batch size: 79, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:32:48,731 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 10:32:56,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1594710.0, ans=0.125 2024-08-12 10:32:57,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2024-08-12 10:33:01,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1594710.0, ans=0.0 2024-08-12 10:33:24,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1594810.0, ans=0.1 2024-08-12 10:33:24,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1594810.0, ans=0.1 2024-08-12 10:33:38,670 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:33:43,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1594910.0, ans=0.2 2024-08-12 10:33:51,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1595010.0, ans=0.0 2024-08-12 10:34:14,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 100, loss[loss=0.1227, beats_loss=0.0106, ecapa_loss=0.0001442, whisper_loss=0.1106, over 18914.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01066, ecapa_loss=0.0001772, whisper_loss=0.08823, over 1549483.85 frames. ], batch size: 72, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:34:21,412 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 10:34:34,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+01 2.774e+01 3.018e+01 3.442e+01 6.372e+01, threshold=6.036e+01, percent-clipped=2.0 2024-08-12 10:34:46,349 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 11 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 10:35:18,047 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 10:35:18,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1595310.0, ans=0.125 2024-08-12 10:35:20,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1595410.0, ans=0.1 2024-08-12 10:35:26,748 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 10:35:42,216 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 10:35:53,916 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 10:36:03,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 150, loss[loss=0.0921, beats_loss=0.01069, ecapa_loss=0.0001597, whisper_loss=0.07981, over 15728.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001787, whisper_loss=0.08942, over 2062642.68 frames. ], batch size: 60, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:36:04,014 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 10:36:26,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1595710.0, ans=0.0 2024-08-12 10:36:48,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1595810.0, ans=0.0 2024-08-12 10:37:04,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1595910.0, ans=0.1 2024-08-12 10:37:06,277 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-12 10:37:09,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1595910.0, ans=0.0 2024-08-12 10:37:11,370 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 10:37:13,018 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 10:37:15,020 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 10:37:17,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1596010.0, ans=0.125 2024-08-12 10:37:31,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 200, loss[loss=0.1134, beats_loss=0.008704, ecapa_loss=0.0002177, whisper_loss=0.1025, over 17767.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001793, whisper_loss=0.09017, over 2438539.42 frames. ], batch size: 71, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:37:49,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.189e+01 2.723e+01 2.993e+01 3.587e+01 5.466e+01, threshold=5.985e+01, percent-clipped=0.0 2024-08-12 10:38:25,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1596410.0, ans=0.125 2024-08-12 10:38:27,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-12 10:38:59,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 250, loss[loss=0.1062, beats_loss=0.0119, ecapa_loss=0.0001331, whisper_loss=0.093, over 20471.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01069, ecapa_loss=0.0001788, whisper_loss=0.09039, over 2748175.01 frames. ], batch size: 76, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:39:20,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1596710.0, ans=0.0 2024-08-12 10:39:30,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1596810.0, ans=0.125 2024-08-12 10:39:43,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1596810.0, ans=0.0 2024-08-12 10:39:47,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1596910.0, ans=0.0 2024-08-12 10:40:01,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1596910.0, ans=0.125 2024-08-12 10:40:06,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=12.0 2024-08-12 10:40:08,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2024-08-12 10:40:19,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 300, loss[loss=0.0958, beats_loss=0.009096, ecapa_loss=0.0001727, whisper_loss=0.08497, over 16713.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01084, ecapa_loss=0.0001773, whisper_loss=0.08999, over 2969905.71 frames. ], batch size: 63, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:40:34,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.533e+01 2.859e+01 3.181e+01 4.204e+01, threshold=5.718e+01, percent-clipped=0.0 2024-08-12 10:40:39,844 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 10:41:09,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.45 vs. limit=22.5 2024-08-12 10:41:11,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1597410.0, ans=0.1 2024-08-12 10:41:14,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1597410.0, ans=0.0 2024-08-12 10:41:25,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1597510.0, ans=0.125 2024-08-12 10:41:39,840 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 350, loss[loss=0.07016, beats_loss=0.0142, ecapa_loss=0.0001478, whisper_loss=0.05448, over 19559.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01092, ecapa_loss=0.0001765, whisper_loss=0.08955, over 3173899.96 frames. ], batch size: 81, lr: 5.37e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:41:44,724 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 10:41:45,049 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.204e+02 2024-08-12 10:42:04,937 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 10:42:23,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1597810.0, ans=0.125 2024-08-12 10:42:26,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1597910.0, ans=0.1 2024-08-12 10:42:32,904 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 10:42:39,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1597910.0, ans=0.125 2024-08-12 10:42:40,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1598010.0, ans=0.5 2024-08-12 10:42:51,519 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 10:42:57,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 400, loss[loss=0.106, beats_loss=0.01064, ecapa_loss=0.0002147, whisper_loss=0.09325, over 20496.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01086, ecapa_loss=0.0001762, whisper_loss=0.09055, over 3339416.15 frames. ], batch size: 85, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:42:59,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1598110.0, ans=0.1 2024-08-12 10:43:11,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.512e+01 2.716e+01 3.145e+01 4.909e+01, threshold=5.433e+01, percent-clipped=0.0 2024-08-12 10:43:45,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1598410.0, ans=0.0 2024-08-12 10:44:02,927 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 10:44:15,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 450, loss[loss=0.1326, beats_loss=0.007729, ecapa_loss=0.0001668, whisper_loss=0.1232, over 20109.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0108, ecapa_loss=0.0001758, whisper_loss=0.09078, over 3419179.50 frames. ], batch size: 75, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:44:36,857 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 10:45:28,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1599010.0, ans=0.2 2024-08-12 10:45:32,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 500, loss[loss=0.1214, beats_loss=0.008185, ecapa_loss=0.0001752, whisper_loss=0.1114, over 14821.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001734, whisper_loss=0.09144, over 3514971.44 frames. ], batch size: 55, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:45:37,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1599110.0, ans=0.09899494936611666 2024-08-12 10:45:39,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1599110.0, ans=0.1 2024-08-12 10:45:46,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.446e+01 2.825e+01 3.305e+01 5.621e+01, threshold=5.651e+01, percent-clipped=2.0 2024-08-12 10:46:00,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-08-12 10:46:03,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-08-12 10:46:13,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1599310.0, ans=0.125 2024-08-12 10:46:20,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1599410.0, ans=0.1 2024-08-12 10:46:43,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1599510.0, ans=0.0 2024-08-12 10:46:44,429 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 41 from LS+wenet, 8 from Vox, 42 fro AS 2024-08-12 10:46:46,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1599510.0, ans=0.07 2024-08-12 10:46:52,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 550, loss[loss=0.1139, beats_loss=0.01067, ecapa_loss=0.000173, whisper_loss=0.1015, over 14577.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001736, whisper_loss=0.09166, over 3583700.90 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:47:13,373 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-12 10:47:31,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-12 10:47:39,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-08-12 10:47:42,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1599910.0, ans=0.0 2024-08-12 10:47:52,803 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 10:47:53,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1599910.0, ans=0.07 2024-08-12 10:47:54,336 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-160000.pt 2024-08-12 10:48:06,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1600010.0, ans=0.0 2024-08-12 10:48:14,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 600, loss[loss=0.09758, beats_loss=0.01089, ecapa_loss=0.0001946, whisper_loss=0.08475, over 15127.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01079, ecapa_loss=0.0001736, whisper_loss=0.09254, over 3655948.28 frames. ], batch size: 58, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:48:21,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1600110.0, ans=0.1 2024-08-12 10:48:28,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.536e+01 2.795e+01 3.405e+01 6.348e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-12 10:48:32,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1600210.0, ans=0.2 2024-08-12 10:48:45,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-12 10:49:18,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1600510.0, ans=0.125 2024-08-12 10:49:25,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1600510.0, ans=0.1 2024-08-12 10:49:26,622 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-12 10:49:31,235 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 650, loss[loss=0.109, beats_loss=0.01155, ecapa_loss=0.0001682, whisper_loss=0.09579, over 18680.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01084, ecapa_loss=0.0001729, whisper_loss=0.09252, over 3701129.34 frames. ], batch size: 73, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:49:34,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1600610.0, ans=0.1 2024-08-12 10:50:00,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1600710.0, ans=0.0 2024-08-12 10:50:06,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2024-08-12 10:50:44,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1601010.0, ans=0.0 2024-08-12 10:50:52,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 700, loss[loss=0.1061, beats_loss=0.01079, ecapa_loss=0.0001788, whisper_loss=0.09353, over 17319.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01081, ecapa_loss=0.0001739, whisper_loss=0.0925, over 3712807.70 frames. ], batch size: 70, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:51:03,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-12 10:51:04,826 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-12 10:51:06,163 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.075e+01 2.429e+01 2.647e+01 2.906e+01 4.054e+01, threshold=5.293e+01, percent-clipped=0.0 2024-08-12 10:51:34,891 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 10:51:55,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1601510.0, ans=0.0 2024-08-12 10:52:10,689 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 750, loss[loss=0.1195, beats_loss=0.01019, ecapa_loss=0.0001677, whisper_loss=0.1076, over 20830.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01084, ecapa_loss=0.000173, whisper_loss=0.09297, over 3759235.14 frames. ], batch size: 83, lr: 5.36e-03, grad_scale: 1.152921504606847e+18 2024-08-12 10:52:31,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1601710.0, ans=0.1 2024-08-12 10:52:45,931 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-12 10:52:51,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1601810.0, ans=0.0 2024-08-12 10:52:57,148 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 10:53:01,857 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 10:53:09,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1601910.0, ans=0.1 2024-08-12 10:53:12,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1602010.0, ans=0.0 2024-08-12 10:53:16,896 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 32 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 10:53:21,757 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 10:53:29,897 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 800, loss[loss=0.08123, beats_loss=0.01151, ecapa_loss=0.0002048, whisper_loss=0.06767, over 16559.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01077, ecapa_loss=0.0001735, whisper_loss=0.09238, over 3752384.47 frames. ], batch size: 68, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:53:34,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-12 10:53:43,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-12 10:53:45,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.463e+01 2.797e+01 3.235e+01 6.542e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 10:54:10,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1602310.0, ans=0.07 2024-08-12 10:54:14,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1602310.0, ans=0.0 2024-08-12 10:54:24,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.03 vs. limit=10.0 2024-08-12 10:54:26,942 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 19 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 10:54:50,066 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 850, loss[loss=0.1222, beats_loss=0.01027, ecapa_loss=0.0001729, whisper_loss=0.1102, over 23515.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001719, whisper_loss=0.09206, over 3800275.56 frames. ], batch size: 90, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:55:06,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-12 10:55:16,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1602710.0, ans=0.125 2024-08-12 10:55:24,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1602810.0, ans=0.125 2024-08-12 10:55:24,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1602810.0, ans=0.1 2024-08-12 10:55:53,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1603010.0, ans=0.035 2024-08-12 10:55:55,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-12 10:56:06,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1603010.0, ans=0.125 2024-08-12 10:56:09,042 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 900, loss[loss=0.1134, beats_loss=0.008283, ecapa_loss=0.0001653, whisper_loss=0.1035, over 21599.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001721, whisper_loss=0.09196, over 3783703.30 frames. ], batch size: 83, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:56:17,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-08-12 10:56:29,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.446e+01 2.685e+01 3.025e+01 4.659e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 10:56:46,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1603310.0, ans=0.125 2024-08-12 10:56:53,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1603310.0, ans=0.125 2024-08-12 10:56:56,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1603310.0, ans=0.2 2024-08-12 10:56:58,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-08-12 10:57:05,058 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 10:57:13,372 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 10:57:23,705 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 10:57:33,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 950, loss[loss=0.09541, beats_loss=0.01051, ecapa_loss=0.0001914, whisper_loss=0.08299, over 19681.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.000173, whisper_loss=0.09169, over 3800157.86 frames. ], batch size: 81, lr: 5.36e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:57:49,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1603610.0, ans=0.0 2024-08-12 10:57:50,659 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 10:57:51,089 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.16 vs. limit=22.5 2024-08-12 10:58:24,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1603810.0, ans=0.125 2024-08-12 10:58:28,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1603810.0, ans=0.1 2024-08-12 10:58:36,193 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 10:58:43,335 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 10:58:46,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2024-08-12 10:58:54,408 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 10:59:03,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=15.0 2024-08-12 10:59:10,595 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1000, loss[loss=0.09843, beats_loss=0.01002, ecapa_loss=0.0001648, whisper_loss=0.08676, over 17571.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01081, ecapa_loss=0.0001727, whisper_loss=0.09122, over 3800320.31 frames. ], batch size: 72, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 10:59:13,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-08-12 10:59:23,499 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 10:59:32,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.574e+01 2.849e+01 3.275e+01 5.377e+01, threshold=5.697e+01, percent-clipped=1.0 2024-08-12 10:59:38,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=1604210.0, ans=22.5 2024-08-12 10:59:42,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1604210.0, ans=0.125 2024-08-12 11:00:03,200 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 11:00:32,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1604510.0, ans=0.125 2024-08-12 11:00:32,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1604510.0, ans=0.2 2024-08-12 11:00:58,812 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1050, loss[loss=0.1023, beats_loss=0.009243, ecapa_loss=0.0001566, whisper_loss=0.09146, over 14688.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.0001711, whisper_loss=0.09106, over 3809150.63 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:01:19,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1604610.0, ans=15.0 2024-08-12 11:01:29,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1604710.0, ans=0.1 2024-08-12 11:01:50,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1604810.0, ans=0.0 2024-08-12 11:01:56,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1604810.0, ans=0.125 2024-08-12 11:02:04,068 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 11:02:14,483 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 11:02:14,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1604910.0, ans=0.1 2024-08-12 11:02:20,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1604910.0, ans=0.07 2024-08-12 11:02:27,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2024-08-12 11:02:42,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-12 11:02:46,702 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.195e-02 2024-08-12 11:02:49,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1605010.0, ans=0.0 2024-08-12 11:03:01,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1100, loss[loss=0.09528, beats_loss=0.009427, ecapa_loss=0.0001947, whisper_loss=0.0839, over 13897.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01073, ecapa_loss=0.0001716, whisper_loss=0.09158, over 3790001.92 frames. ], batch size: 58, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:03:27,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.546e+01 2.827e+01 3.274e+01 5.638e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 11:03:40,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1605210.0, ans=0.125 2024-08-12 11:03:53,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1605310.0, ans=0.1 2024-08-12 11:03:57,806 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 11:05:09,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1150, loss[loss=0.1132, beats_loss=0.01004, ecapa_loss=0.0002088, whisper_loss=0.1011, over 21565.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001719, whisper_loss=0.09182, over 3794406.59 frames. ], batch size: 87, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:05:24,747 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 11:05:25,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1605610.0, ans=0.125 2024-08-12 11:05:36,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1605610.0, ans=0.1 2024-08-12 11:05:41,362 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 11:06:07,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1605810.0, ans=0.0 2024-08-12 11:06:25,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1605810.0, ans=0.125 2024-08-12 11:06:35,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-08-12 11:06:39,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1605910.0, ans=0.2 2024-08-12 11:07:01,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=22.5 2024-08-12 11:07:05,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1606010.0, ans=0.125 2024-08-12 11:07:10,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1606010.0, ans=0.0 2024-08-12 11:07:14,628 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1200, loss[loss=0.08988, beats_loss=0.009151, ecapa_loss=0.0002242, whisper_loss=0.07849, over 13847.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01086, ecapa_loss=0.0001717, whisper_loss=0.09084, over 3776900.06 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:07:15,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1606110.0, ans=15.0 2024-08-12 11:07:19,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1606110.0, ans=0.125 2024-08-12 11:07:36,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.362e+01 2.610e+01 2.988e+01 4.824e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-12 11:07:44,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-08-12 11:08:02,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.90 vs. limit=22.5 2024-08-12 11:08:29,205 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 11:08:33,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1606410.0, ans=0.125 2024-08-12 11:08:35,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1606410.0, ans=0.0 2024-08-12 11:09:03,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1250, loss[loss=0.09404, beats_loss=0.008889, ecapa_loss=0.0002212, whisper_loss=0.08294, over 14272.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001708, whisper_loss=0.09064, over 3763256.83 frames. ], batch size: 60, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:09:06,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1606610.0, ans=0.1 2024-08-12 11:09:15,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1606610.0, ans=0.125 2024-08-12 11:09:23,151 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 11:09:45,685 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 11:10:02,150 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-12 11:10:02,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2024-08-12 11:10:07,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1606910.0, ans=0.125 2024-08-12 11:10:28,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1300, loss[loss=0.09933, beats_loss=0.0128, ecapa_loss=0.0001469, whisper_loss=0.08506, over 22746.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001699, whisper_loss=0.09128, over 3804246.22 frames. ], batch size: 91, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:10:44,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.467e+01 2.705e+01 3.116e+01 5.074e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-12 11:11:40,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1607510.0, ans=0.0 2024-08-12 11:11:49,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1350, loss[loss=0.07992, beats_loss=0.01407, ecapa_loss=0.0001568, whisper_loss=0.06429, over 16429.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.000169, whisper_loss=0.09041, over 3792068.78 frames. ], batch size: 66, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:12:10,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=12.0 2024-08-12 11:12:14,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1607710.0, ans=0.125 2024-08-12 11:12:23,588 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 11:12:41,682 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-12 11:12:55,269 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 11:12:55,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1608010.0, ans=0.2 2024-08-12 11:13:06,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2024-08-12 11:13:11,523 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1400, loss[loss=0.1169, beats_loss=0.01174, ecapa_loss=0.0001338, whisper_loss=0.1039, over 18744.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001685, whisper_loss=0.09072, over 3821414.19 frames. ], batch size: 71, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:13:13,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1608110.0, ans=0.0 2024-08-12 11:13:18,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.56 vs. limit=10.0 2024-08-12 11:13:27,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.408e+01 2.816e+01 3.296e+01 5.087e+01, threshold=5.632e+01, percent-clipped=0.0 2024-08-12 11:13:28,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1608210.0, ans=0.0 2024-08-12 11:13:32,179 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 11:13:41,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1608210.0, ans=0.0 2024-08-12 11:13:47,512 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-12 11:14:05,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1608410.0, ans=0.125 2024-08-12 11:14:15,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1608410.0, ans=0.2 2024-08-12 11:14:17,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1608510.0, ans=0.2 2024-08-12 11:14:34,807 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1450, loss[loss=0.09876, beats_loss=0.01133, ecapa_loss=0.0001376, whisper_loss=0.08605, over 17493.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01095, ecapa_loss=0.0001684, whisper_loss=0.09032, over 3832127.86 frames. ], batch size: 68, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:15:15,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1608710.0, ans=0.125 2024-08-12 11:15:18,584 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 11:15:19,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1608710.0, ans=0.1 2024-08-12 11:15:20,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1608710.0, ans=0.125 2024-08-12 11:15:20,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-12 11:15:36,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1608810.0, ans=0.125 2024-08-12 11:15:38,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1608810.0, ans=0.0 2024-08-12 11:15:52,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1608910.0, ans=0.2 2024-08-12 11:16:14,346 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 11:16:22,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1500, loss[loss=0.08544, beats_loss=0.01066, ecapa_loss=0.0001871, whisper_loss=0.0729, over 18691.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0111, ecapa_loss=0.0001675, whisper_loss=0.08935, over 3830675.96 frames. ], batch size: 78, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:16:24,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.09 vs. limit=22.5 2024-08-12 11:16:33,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1609110.0, ans=0.0 2024-08-12 11:16:38,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.429e+01 2.735e+01 3.054e+01 5.898e+01, threshold=5.470e+01, percent-clipped=1.0 2024-08-12 11:16:39,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2024-08-12 11:16:56,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1609310.0, ans=0.125 2024-08-12 11:17:14,080 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 11:17:14,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1609310.0, ans=10.0 2024-08-12 11:17:18,206 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 11:17:23,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1609410.0, ans=0.2 2024-08-12 11:17:30,563 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-12 11:17:35,346 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 11:17:52,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1550, loss[loss=0.1103, beats_loss=0.01104, ecapa_loss=0.0001845, whisper_loss=0.09741, over 18807.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01113, ecapa_loss=0.0001667, whisper_loss=0.08966, over 3821875.75 frames. ], batch size: 75, lr: 5.35e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:18:02,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-12 11:18:38,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=12.0 2024-08-12 11:19:19,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1600, loss[loss=0.1304, beats_loss=0.008242, ecapa_loss=0.0001973, whisper_loss=0.1202, over 22806.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01107, ecapa_loss=0.0001669, whisper_loss=0.08983, over 3858395.54 frames. ], batch size: 90, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:19:28,534 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 11:19:28,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1610110.0, ans=0.125 2024-08-12 11:19:34,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1610110.0, ans=0.125 2024-08-12 11:19:36,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.594e+01 2.878e+01 3.251e+01 6.117e+01, threshold=5.756e+01, percent-clipped=2.0 2024-08-12 11:19:38,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-12 11:19:47,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1610210.0, ans=0.015 2024-08-12 11:19:50,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1610210.0, ans=0.1 2024-08-12 11:20:16,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1610410.0, ans=0.1 2024-08-12 11:20:19,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.38 vs. limit=6.0 2024-08-12 11:20:31,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1610510.0, ans=0.125 2024-08-12 11:20:33,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1610510.0, ans=0.0 2024-08-12 11:20:40,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1610510.0, ans=0.0 2024-08-12 11:20:45,795 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1650, loss[loss=0.09319, beats_loss=0.01207, ecapa_loss=0.0001776, whisper_loss=0.07935, over 21169.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01097, ecapa_loss=0.0001688, whisper_loss=0.09082, over 3880289.07 frames. ], batch size: 90, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:21:10,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1610710.0, ans=0.05 2024-08-12 11:21:22,786 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 11:21:41,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-12 11:21:51,115 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 11:22:00,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1611010.0, ans=0.125 2024-08-12 11:22:08,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1700, loss[loss=0.08891, beats_loss=0.008427, ecapa_loss=0.0001559, whisper_loss=0.07892, over 15068.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01091, ecapa_loss=0.0001698, whisper_loss=0.09104, over 3861679.38 frames. ], batch size: 57, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:22:10,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1611110.0, ans=0.0 2024-08-12 11:22:24,324 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.098e+01 2.487e+01 2.798e+01 3.265e+01 1.299e+02, threshold=5.596e+01, percent-clipped=2.0 2024-08-12 11:22:25,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1611210.0, ans=0.0 2024-08-12 11:22:35,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-12 11:22:39,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1611310.0, ans=0.2 2024-08-12 11:22:39,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1611310.0, ans=0.125 2024-08-12 11:22:41,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1611310.0, ans=0.125 2024-08-12 11:22:44,350 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-12 11:22:57,386 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-12 11:23:04,055 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 11:23:25,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1611510.0, ans=0.0 2024-08-12 11:23:25,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1611510.0, ans=0.0 2024-08-12 11:23:29,978 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1750, loss[loss=0.09555, beats_loss=0.01044, ecapa_loss=0.0001743, whisper_loss=0.08337, over 16094.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01096, ecapa_loss=0.0001694, whisper_loss=0.09079, over 3855269.96 frames. ], batch size: 62, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:23:35,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1611610.0, ans=0.125 2024-08-12 11:23:37,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-12 11:23:51,407 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 11:23:54,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1611710.0, ans=0.125 2024-08-12 11:24:03,795 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 11:24:14,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1611810.0, ans=0.125 2024-08-12 11:24:49,906 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1800, loss[loss=0.1153, beats_loss=0.009533, ecapa_loss=0.000152, whisper_loss=0.1043, over 17323.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01085, ecapa_loss=0.0001704, whisper_loss=0.09153, over 3876517.32 frames. ], batch size: 64, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:24:50,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1612110.0, ans=0.1 2024-08-12 11:25:05,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.474e+01 2.742e+01 2.995e+01 4.904e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-12 11:25:10,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1612210.0, ans=0.0 2024-08-12 11:25:55,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2024-08-12 11:26:02,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-12 11:26:14,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1850, loss[loss=0.08421, beats_loss=0.009755, ecapa_loss=0.0001722, whisper_loss=0.07273, over 15168.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001705, whisper_loss=0.09139, over 3893185.52 frames. ], batch size: 59, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:26:30,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1612610.0, ans=0.125 2024-08-12 11:26:48,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1612710.0, ans=0.1 2024-08-12 11:27:22,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1612910.0, ans=0.0 2024-08-12 11:27:24,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1612910.0, ans=0.0 2024-08-12 11:27:24,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-08-12 11:27:44,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2024-08-12 11:27:58,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1613010.0, ans=0.125 2024-08-12 11:28:04,046 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1900, loss[loss=0.09373, beats_loss=0.01094, ecapa_loss=0.0001689, whisper_loss=0.0811, over 20884.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001712, whisper_loss=0.09114, over 3870426.07 frames. ], batch size: 81, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:28:08,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1613110.0, ans=0.1 2024-08-12 11:28:09,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1613110.0, ans=0.125 2024-08-12 11:28:26,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.546e+01 2.864e+01 3.475e+01 5.350e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-12 11:28:27,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2024-08-12 11:28:47,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1613310.0, ans=0.0 2024-08-12 11:28:49,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1613310.0, ans=0.1 2024-08-12 11:29:19,939 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 11:29:29,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.97 vs. limit=8.0 2024-08-12 11:29:44,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 1950, loss[loss=0.07742, beats_loss=0.01387, ecapa_loss=0.000149, whisper_loss=0.06205, over 18689.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01088, ecapa_loss=0.0001726, whisper_loss=0.09135, over 3831533.61 frames. ], batch size: 78, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:29:51,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1613610.0, ans=0.0 2024-08-12 11:30:04,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1613710.0, ans=0.125 2024-08-12 11:30:05,925 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 11:30:17,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=1613810.0, ans=0.02 2024-08-12 11:30:26,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1613810.0, ans=0.0 2024-08-12 11:30:30,209 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 11:30:40,014 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 11:30:46,385 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 11:30:59,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1614010.0, ans=0.05 2024-08-12 11:31:05,521 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2000, loss[loss=0.1218, beats_loss=0.007433, ecapa_loss=0.0001703, whisper_loss=0.1126, over 16392.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01085, ecapa_loss=0.0001733, whisper_loss=0.09173, over 3821888.39 frames. ], batch size: 61, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:31:19,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1614210.0, ans=0.125 2024-08-12 11:31:20,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.474e+01 2.700e+01 3.035e+01 6.607e+01, threshold=5.401e+01, percent-clipped=2.0 2024-08-12 11:31:28,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1614210.0, ans=0.1 2024-08-12 11:31:35,260 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 11:31:37,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1614310.0, ans=0.125 2024-08-12 11:31:42,220 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 11:31:51,644 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 11:31:53,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1614410.0, ans=0.125 2024-08-12 11:32:07,620 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 11:32:14,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1614510.0, ans=0.125 2024-08-12 11:32:15,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.36 vs. limit=22.5 2024-08-12 11:32:23,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1614610.0, ans=0.125 2024-08-12 11:32:24,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2050, loss[loss=0.09685, beats_loss=0.01328, ecapa_loss=0.0001299, whisper_loss=0.08227, over 23162.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01084, ecapa_loss=0.0001733, whisper_loss=0.09194, over 3815281.53 frames. ], batch size: 92, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:32:28,959 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 11:33:23,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-12 11:33:35,698 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-12 11:33:43,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-08-12 11:33:46,542 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2100, loss[loss=0.09977, beats_loss=0.0108, ecapa_loss=0.0001712, whisper_loss=0.08726, over 20574.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01086, ecapa_loss=0.0001733, whisper_loss=0.09157, over 3790652.34 frames. ], batch size: 83, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:33:48,270 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 11:33:58,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1615110.0, ans=0.0 2024-08-12 11:34:02,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.515e+01 2.855e+01 3.226e+01 9.750e+01, threshold=5.709e+01, percent-clipped=2.0 2024-08-12 11:34:15,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1615210.0, ans=0.125 2024-08-12 11:34:19,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1615310.0, ans=0.05 2024-08-12 11:34:43,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1615410.0, ans=0.0 2024-08-12 11:34:44,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1615410.0, ans=0.025 2024-08-12 11:34:48,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1615510.0, ans=0.05 2024-08-12 11:35:04,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2150, loss[loss=0.08146, beats_loss=0.0133, ecapa_loss=0.000172, whisper_loss=0.06644, over 18054.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01085, ecapa_loss=0.0001743, whisper_loss=0.09233, over 3820726.02 frames. ], batch size: 74, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:35:06,094 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 11:35:55,728 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 11:35:56,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2024-08-12 11:36:03,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1615910.0, ans=0.1 2024-08-12 11:36:15,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1616010.0, ans=0.125 2024-08-12 11:36:23,478 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2200, loss[loss=0.1059, beats_loss=0.009291, ecapa_loss=0.0001692, whisper_loss=0.09495, over 14732.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01093, ecapa_loss=0.0001728, whisper_loss=0.09275, over 3832461.53 frames. ], batch size: 58, lr: 5.34e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:36:40,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.491e+01 2.779e+01 3.104e+01 1.679e+02, threshold=5.558e+01, percent-clipped=1.0 2024-08-12 11:36:44,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1616210.0, ans=0.0 2024-08-12 11:36:44,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1616210.0, ans=0.125 2024-08-12 11:36:50,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1616210.0, ans=0.07 2024-08-12 11:37:19,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1616410.0, ans=0.0 2024-08-12 11:37:32,393 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 11:37:37,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.05 vs. limit=22.5 2024-08-12 11:37:44,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2250, loss[loss=0.1207, beats_loss=0.0106, ecapa_loss=0.0001475, whisper_loss=0.1086, over 19044.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01096, ecapa_loss=0.0001737, whisper_loss=0.09269, over 3860370.10 frames. ], batch size: 72, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:37:52,463 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-12 11:37:52,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1616610.0, ans=0.125 2024-08-12 11:38:15,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1616710.0, ans=0.0 2024-08-12 11:38:19,844 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 11:38:28,864 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-12 11:38:31,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-08-12 11:38:47,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.01 vs. limit=22.5 2024-08-12 11:38:51,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1617010.0, ans=0.125 2024-08-12 11:38:53,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1617010.0, ans=0.125 2024-08-12 11:38:54,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1617010.0, ans=0.125 2024-08-12 11:39:01,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1617010.0, ans=0.0 2024-08-12 11:39:05,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2300, loss[loss=0.08896, beats_loss=0.01053, ecapa_loss=0.000138, whisper_loss=0.07705, over 18225.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.011, ecapa_loss=0.0001745, whisper_loss=0.09259, over 3876872.61 frames. ], batch size: 70, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:39:08,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1617110.0, ans=0.125 2024-08-12 11:39:10,920 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-12 11:39:22,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.197e+01 2.578e+01 2.776e+01 3.127e+01 7.036e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 11:39:30,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1617210.0, ans=0.0 2024-08-12 11:39:42,677 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 11:39:44,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-08-12 11:40:25,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2350, loss[loss=0.1005, beats_loss=0.01057, ecapa_loss=0.000182, whisper_loss=0.08813, over 22298.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001753, whisper_loss=0.09228, over 3868079.50 frames. ], batch size: 89, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:40:29,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1617610.0, ans=0.0 2024-08-12 11:40:34,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-12 11:40:47,471 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 11:41:12,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-12 11:41:15,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1617910.0, ans=0.125 2024-08-12 11:41:22,003 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 11:41:34,894 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 11:41:47,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2400, loss[loss=0.07656, beats_loss=0.009479, ecapa_loss=0.0001512, whisper_loss=0.06556, over 15987.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001752, whisper_loss=0.09185, over 3846716.81 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:42:03,074 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.471e+01 2.708e+01 3.082e+01 4.957e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-12 11:42:30,090 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 11:42:36,741 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 11:42:39,874 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 11:42:52,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1618510.0, ans=0.5 2024-08-12 11:42:55,175 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 11:43:06,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2450, loss[loss=0.1079, beats_loss=0.009346, ecapa_loss=0.0001859, whisper_loss=0.0967, over 20012.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001757, whisper_loss=0.0917, over 3870798.20 frames. ], batch size: 75, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:43:08,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1618610.0, ans=0.125 2024-08-12 11:43:27,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1618710.0, ans=0.125 2024-08-12 11:43:29,965 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 11:43:44,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-08-12 11:43:47,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1618810.0, ans=0.125 2024-08-12 11:44:10,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1618910.0, ans=0.1 2024-08-12 11:44:31,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1619010.0, ans=0.0 2024-08-12 11:44:34,094 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.182e-01 2024-08-12 11:44:49,163 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2500, loss[loss=0.125, beats_loss=0.007778, ecapa_loss=0.0001586, whisper_loss=0.1156, over 16031.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001755, whisper_loss=0.09178, over 3850912.61 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:44:49,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1619110.0, ans=0.1 2024-08-12 11:44:51,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1619110.0, ans=0.2 2024-08-12 11:44:54,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1619110.0, ans=0.125 2024-08-12 11:45:07,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2024-08-12 11:45:10,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.514e+01 2.796e+01 3.106e+01 8.282e+01, threshold=5.592e+01, percent-clipped=2.0 2024-08-12 11:45:27,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-12 11:45:39,589 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 11:45:53,578 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 11:45:59,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1619410.0, ans=0.1 2024-08-12 11:46:11,477 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 11:46:33,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1619510.0, ans=0.125 2024-08-12 11:46:33,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1619510.0, ans=0.125 2024-08-12 11:46:38,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-08-12 11:46:39,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2550, loss[loss=0.1326, beats_loss=0.007295, ecapa_loss=0.0001928, whisper_loss=0.1234, over 21693.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01087, ecapa_loss=0.0001743, whisper_loss=0.09242, over 3895331.90 frames. ], batch size: 85, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:47:01,892 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 11:47:02,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1619710.0, ans=0.125 2024-08-12 11:47:21,104 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 11:47:25,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-12 11:47:31,270 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 20 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-12 11:47:53,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1620010.0, ans=0.125 2024-08-12 11:47:59,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1620010.0, ans=0.05 2024-08-12 11:48:05,523 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2600, loss[loss=0.1193, beats_loss=0.008744, ecapa_loss=0.0001917, whisper_loss=0.1086, over 22037.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01088, ecapa_loss=0.0001754, whisper_loss=0.09204, over 3884703.04 frames. ], batch size: 90, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:48:11,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.75 vs. limit=22.5 2024-08-12 11:48:19,777 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 11:48:21,138 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.606e+01 2.871e+01 3.471e+01 6.871e+01, threshold=5.743e+01, percent-clipped=3.0 2024-08-12 11:48:27,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2024-08-12 11:48:36,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1620310.0, ans=0.125 2024-08-12 11:48:39,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1620310.0, ans=0.2 2024-08-12 11:48:48,798 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 33 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 11:49:00,072 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 11:49:11,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1620510.0, ans=0.0 2024-08-12 11:49:24,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2650, loss[loss=0.104, beats_loss=0.009155, ecapa_loss=0.0002268, whisper_loss=0.09253, over 16787.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001757, whisper_loss=0.09223, over 3877812.05 frames. ], batch size: 70, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:49:48,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1620710.0, ans=0.125 2024-08-12 11:49:53,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1620710.0, ans=0.125 2024-08-12 11:50:03,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1620810.0, ans=0.0 2024-08-12 11:50:14,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-12 11:50:16,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1620910.0, ans=0.125 2024-08-12 11:50:18,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1620910.0, ans=0.125 2024-08-12 11:50:24,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1620910.0, ans=0.125 2024-08-12 11:50:32,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1621010.0, ans=0.05 2024-08-12 11:50:42,296 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2700, loss[loss=0.1157, beats_loss=0.0117, ecapa_loss=0.0001461, whisper_loss=0.1025, over 20365.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001749, whisper_loss=0.09152, over 3883320.62 frames. ], batch size: 79, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:50:58,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.509e+01 2.801e+01 3.158e+01 4.809e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-12 11:51:03,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1621210.0, ans=0.2 2024-08-12 11:51:05,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1621210.0, ans=0.125 2024-08-12 11:51:14,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1621310.0, ans=0.125 2024-08-12 11:51:37,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1621410.0, ans=0.125 2024-08-12 11:51:52,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1621510.0, ans=0.1 2024-08-12 11:51:52,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-12 11:51:52,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2024-08-12 11:51:53,291 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 11:52:02,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2750, loss[loss=0.08494, beats_loss=0.009304, ecapa_loss=0.0002271, whisper_loss=0.07337, over 13904.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001755, whisper_loss=0.09177, over 3888551.87 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:52:04,238 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-12 11:52:07,609 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 11:52:20,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1621710.0, ans=0.125 2024-08-12 11:52:36,009 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 11:52:43,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.10 vs. limit=22.5 2024-08-12 11:52:55,805 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 11:52:56,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1621910.0, ans=0.0 2024-08-12 11:52:56,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1621910.0, ans=0.125 2024-08-12 11:53:11,789 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-12 11:53:18,005 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 11:53:20,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=12.0 2024-08-12 11:53:22,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2800, loss[loss=0.1099, beats_loss=0.009909, ecapa_loss=0.0001745, whisper_loss=0.09826, over 19105.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01097, ecapa_loss=0.000175, whisper_loss=0.09257, over 3900592.75 frames. ], batch size: 76, lr: 5.33e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:53:26,799 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 11:53:28,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1622110.0, ans=0.125 2024-08-12 11:53:37,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.464e+01 2.680e+01 3.068e+01 4.016e+01, threshold=5.359e+01, percent-clipped=0.0 2024-08-12 11:54:14,181 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 40 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 11:54:33,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1622510.0, ans=0.1 2024-08-12 11:54:35,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-08-12 11:54:43,494 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2850, loss[loss=0.1282, beats_loss=0.013, ecapa_loss=0.00014, whisper_loss=0.1138, over 15332.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001751, whisper_loss=0.09226, over 3870673.31 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:55:09,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1622710.0, ans=0.125 2024-08-12 11:55:19,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=15.0 2024-08-12 11:55:27,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1622810.0, ans=0.0 2024-08-12 11:55:39,400 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 11:56:05,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2900, loss[loss=0.1, beats_loss=0.009569, ecapa_loss=0.0001709, whisper_loss=0.08875, over 21771.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01093, ecapa_loss=0.0001761, whisper_loss=0.09265, over 3852392.22 frames. ], batch size: 86, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:56:12,652 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-12 11:56:14,044 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 11:56:14,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1623110.0, ans=0.0 2024-08-12 11:56:20,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.518e+01 2.817e+01 3.035e+01 4.423e+01, threshold=5.633e+01, percent-clipped=0.0 2024-08-12 11:56:48,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1623310.0, ans=0.125 2024-08-12 11:57:06,900 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.651e+05 2024-08-12 11:57:11,076 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 11:57:18,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1623510.0, ans=0.0 2024-08-12 11:57:25,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 2950, loss[loss=0.09838, beats_loss=0.01003, ecapa_loss=0.0001902, whisper_loss=0.08645, over 21377.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01091, ecapa_loss=0.0001764, whisper_loss=0.0928, over 3875609.28 frames. ], batch size: 90, lr: 5.32e-03, grad_scale: 1.152921504606847e+18 2024-08-12 11:57:25,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1623610.0, ans=0.2 2024-08-12 11:57:27,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1623610.0, ans=0.125 2024-08-12 11:57:30,151 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 11:57:51,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=12.0 2024-08-12 11:58:07,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1623810.0, ans=0.1 2024-08-12 11:58:17,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-12 11:58:19,735 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-12 11:58:26,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1623910.0, ans=0.0 2024-08-12 11:58:41,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1624010.0, ans=0.125 2024-08-12 11:58:44,663 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3000, loss[loss=0.1036, beats_loss=0.009874, ecapa_loss=0.0001973, whisper_loss=0.09173, over 17424.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01094, ecapa_loss=0.0001749, whisper_loss=0.09261, over 3875217.53 frames. ], batch size: 71, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 11:58:44,665 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 11:59:25,765 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on ASR_libri: loss=0.256, beats_loss=0, ecapa_loss=0.0005941, whisper_loss=0.2501, over 922467.00 frames. 2024-08-12 11:59:45,032 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on SV_voxceleb1: loss=0.00471, beats_loss=0, ecapa_loss=0.000471, whisper_loss=0, over 939242.00 frames. 2024-08-12 12:01:46,921 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on AT_audioset: loss=0.02429, beats_loss=0.02429, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 12:01:46,926 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 12:01:49,953 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 12:01:56,501 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 12:01:58,145 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-12 12:02:03,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.553e+01 2.970e+01 3.483e+01 4.771e+01, threshold=5.939e+01, percent-clipped=0.0 2024-08-12 12:02:17,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1624310.0, ans=0.125 2024-08-12 12:02:21,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1624310.0, ans=0.1 2024-08-12 12:02:35,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2024-08-12 12:02:39,636 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 25 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-12 12:03:03,991 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 12:03:05,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3050, loss[loss=0.1154, beats_loss=0.009799, ecapa_loss=0.0001783, whisper_loss=0.1039, over 19128.00 frames. ], tot_loss[loss=0.1066, beats_loss=0.0109, ecapa_loss=0.0001748, whisper_loss=0.09391, over 3890432.29 frames. ], batch size: 74, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:03:10,295 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-12 12:03:22,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2024-08-12 12:03:46,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1624810.0, ans=0.1 2024-08-12 12:03:50,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1624810.0, ans=0.125 2024-08-12 12:04:04,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-12 12:04:10,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1625010.0, ans=0.95 2024-08-12 12:04:24,109 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 12:04:25,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3100, loss[loss=0.1141, beats_loss=0.008666, ecapa_loss=0.0001627, whisper_loss=0.1038, over 15005.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.011, ecapa_loss=0.000175, whisper_loss=0.09283, over 3896529.86 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:04:42,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.511e+01 2.833e+01 3.211e+01 6.314e+01, threshold=5.667e+01, percent-clipped=1.0 2024-08-12 12:04:43,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2024-08-12 12:05:30,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1625510.0, ans=0.125 2024-08-12 12:05:33,080 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 12:05:44,005 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3150, loss[loss=0.08366, beats_loss=0.01369, ecapa_loss=0.0001359, whisper_loss=0.06861, over 19800.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.011, ecapa_loss=0.0001757, whisper_loss=0.09232, over 3850274.88 frames. ], batch size: 76, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:05:47,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1625610.0, ans=0.125 2024-08-12 12:06:03,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1625710.0, ans=0.0 2024-08-12 12:06:05,377 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 12:06:15,974 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 12:06:34,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1625910.0, ans=0.1 2024-08-12 12:06:45,722 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-12 12:06:51,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1626010.0, ans=0.125 2024-08-12 12:07:03,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3200, loss[loss=0.1265, beats_loss=0.01077, ecapa_loss=0.0001567, whisper_loss=0.1142, over 23253.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01095, ecapa_loss=0.0001759, whisper_loss=0.0932, over 3851237.20 frames. ], batch size: 91, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:07:05,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1626110.0, ans=0.035 2024-08-12 12:07:21,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.397e+01 2.776e+01 3.062e+01 4.690e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 12:07:27,680 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 12:07:49,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1626310.0, ans=0.015 2024-08-12 12:07:53,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1626410.0, ans=0.0 2024-08-12 12:08:06,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1626510.0, ans=0.025 2024-08-12 12:08:15,522 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 12:08:19,945 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 22 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-12 12:08:22,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3250, loss[loss=0.09045, beats_loss=0.01496, ecapa_loss=0.0001582, whisper_loss=0.0739, over 21770.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01093, ecapa_loss=0.0001764, whisper_loss=0.09368, over 3842491.97 frames. ], batch size: 91, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:08:23,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1626610.0, ans=0.125 2024-08-12 12:08:30,492 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 12:08:55,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1626810.0, ans=0.125 2024-08-12 12:08:57,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1626810.0, ans=0.2 2024-08-12 12:09:05,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=10.0 2024-08-12 12:09:15,765 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.991e+00 2024-08-12 12:09:42,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3300, loss[loss=0.09123, beats_loss=0.01274, ecapa_loss=0.0001754, whisper_loss=0.07674, over 13212.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01097, ecapa_loss=0.0001779, whisper_loss=0.09266, over 3832527.03 frames. ], batch size: 53, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:09:42,437 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 32 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 12:09:48,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1627110.0, ans=0.0 2024-08-12 12:09:58,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.622e+01 3.065e+01 3.686e+01 1.090e+02, threshold=6.129e+01, percent-clipped=1.0 2024-08-12 12:10:02,318 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 12:10:05,140 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 12:10:13,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.67 vs. limit=15.0 2024-08-12 12:10:14,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1627310.0, ans=0.125 2024-08-12 12:10:20,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1627310.0, ans=0.125 2024-08-12 12:10:43,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1627510.0, ans=0.125 2024-08-12 12:10:52,465 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 12:10:56,711 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 12:10:59,342 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3350, loss[loss=0.1021, beats_loss=0.01105, ecapa_loss=0.0001705, whisper_loss=0.08934, over 16621.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01089, ecapa_loss=0.0001784, whisper_loss=0.09313, over 3832845.64 frames. ], batch size: 65, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:11:01,404 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-12 12:11:11,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2024-08-12 12:11:18,921 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 12:11:21,851 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-12 12:11:28,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1627710.0, ans=0.0 2024-08-12 12:11:35,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1627810.0, ans=0.0 2024-08-12 12:11:37,688 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 12:11:38,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1627810.0, ans=0.0 2024-08-12 12:11:45,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-08-12 12:11:49,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2024-08-12 12:11:53,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1627910.0, ans=0.125 2024-08-12 12:12:17,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3400, loss[loss=0.1074, beats_loss=0.009459, ecapa_loss=0.0001515, whisper_loss=0.09642, over 16922.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01092, ecapa_loss=0.0001766, whisper_loss=0.0933, over 3852642.25 frames. ], batch size: 62, lr: 5.32e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:12:35,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.455e+01 2.782e+01 3.017e+01 1.106e+02, threshold=5.563e+01, percent-clipped=1.0 2024-08-12 12:12:38,927 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 12:12:57,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1628310.0, ans=0.0 2024-08-12 12:12:58,917 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 12:13:10,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1628410.0, ans=0.125 2024-08-12 12:13:26,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2024-08-12 12:13:29,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1628510.0, ans=0.1 2024-08-12 12:13:36,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3450, loss[loss=0.0969, beats_loss=0.01167, ecapa_loss=0.0001645, whisper_loss=0.08358, over 15212.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01093, ecapa_loss=0.0001765, whisper_loss=0.09292, over 3846403.61 frames. ], batch size: 59, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:13:41,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1628610.0, ans=0.2 2024-08-12 12:13:50,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2024-08-12 12:14:04,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1628710.0, ans=0.05 2024-08-12 12:14:10,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1628810.0, ans=0.0 2024-08-12 12:14:38,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1629010.0, ans=0.0 2024-08-12 12:14:38,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1629010.0, ans=0.2 2024-08-12 12:14:51,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1629010.0, ans=0.125 2024-08-12 12:14:53,503 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3500, loss[loss=0.09745, beats_loss=0.01402, ecapa_loss=0.0001691, whisper_loss=0.08174, over 21534.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01093, ecapa_loss=0.0001774, whisper_loss=0.09265, over 3876801.75 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:15:10,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.037e+01 2.491e+01 2.788e+01 3.215e+01 5.809e+01, threshold=5.577e+01, percent-clipped=2.0 2024-08-12 12:15:25,353 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 12:15:25,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1629310.0, ans=0.125 2024-08-12 12:15:41,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1629410.0, ans=0.125 2024-08-12 12:16:02,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-08-12 12:16:04,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1629510.0, ans=0.0 2024-08-12 12:16:09,442 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 12:16:11,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=12.0 2024-08-12 12:16:12,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3550, loss[loss=0.09522, beats_loss=0.01092, ecapa_loss=0.0001522, whisper_loss=0.08277, over 14271.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0109, ecapa_loss=0.0001767, whisper_loss=0.09252, over 3899056.76 frames. ], batch size: 55, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:16:24,417 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-12 12:16:46,493 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 25 from LS+wenet, 8 from Vox, 24 fro AS 2024-08-12 12:17:28,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3600, loss[loss=0.1038, beats_loss=0.011, ecapa_loss=0.0001465, whisper_loss=0.09136, over 22561.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01095, ecapa_loss=0.0001768, whisper_loss=0.09212, over 3890000.16 frames. ], batch size: 87, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:17:36,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1630110.0, ans=0.125 2024-08-12 12:17:45,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.089e+01 2.537e+01 2.866e+01 3.271e+01 6.335e+01, threshold=5.732e+01, percent-clipped=1.0 2024-08-12 12:17:47,435 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 12:17:56,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1630210.0, ans=0.5 2024-08-12 12:18:16,466 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 12:18:28,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1630410.0, ans=0.0 2024-08-12 12:18:43,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1630510.0, ans=0.0 2024-08-12 12:18:44,937 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 12:18:45,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2024-08-12 12:18:46,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3650, loss[loss=0.1218, beats_loss=0.01029, ecapa_loss=0.0001526, whisper_loss=0.11, over 18683.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01099, ecapa_loss=0.0001764, whisper_loss=0.09166, over 3860177.96 frames. ], batch size: 72, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:18:49,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1630610.0, ans=0.125 2024-08-12 12:19:04,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1630710.0, ans=0.0 2024-08-12 12:19:19,908 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 12:19:25,687 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:19:39,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1630910.0, ans=0.125 2024-08-12 12:19:49,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1631010.0, ans=0.125 2024-08-12 12:20:05,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3700, loss[loss=0.1093, beats_loss=0.01045, ecapa_loss=0.000183, whisper_loss=0.09699, over 22568.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001775, whisper_loss=0.09202, over 3877125.00 frames. ], batch size: 94, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:20:09,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-12 12:20:23,219 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+01 2.690e+01 3.090e+01 3.461e+01 6.737e+01, threshold=6.180e+01, percent-clipped=1.0 2024-08-12 12:20:24,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1631210.0, ans=0.1 2024-08-12 12:20:25,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2024-08-12 12:20:34,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1631210.0, ans=0.125 2024-08-12 12:20:44,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1631310.0, ans=0.2 2024-08-12 12:21:24,866 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3750, loss[loss=0.1114, beats_loss=0.009755, ecapa_loss=0.0002261, whisper_loss=0.0994, over 18077.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001765, whisper_loss=0.09192, over 3862130.43 frames. ], batch size: 72, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:21:25,273 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-12 12:21:48,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1631710.0, ans=0.1 2024-08-12 12:22:05,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=12.0 2024-08-12 12:22:19,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1631910.0, ans=15.0 2024-08-12 12:22:28,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1632010.0, ans=0.0 2024-08-12 12:22:35,042 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 12:22:44,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3800, loss[loss=0.1044, beats_loss=0.01175, ecapa_loss=0.000165, whisper_loss=0.09104, over 14885.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001774, whisper_loss=0.09183, over 3869990.67 frames. ], batch size: 57, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:23:02,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.541e+01 2.857e+01 3.346e+01 7.613e+01, threshold=5.713e+01, percent-clipped=1.0 2024-08-12 12:23:33,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2024-08-12 12:24:02,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3850, loss[loss=0.1011, beats_loss=0.01227, ecapa_loss=0.0001245, whisper_loss=0.08763, over 22343.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01114, ecapa_loss=0.0001757, whisper_loss=0.09096, over 3886872.41 frames. ], batch size: 86, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:24:04,398 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 12:24:09,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1632610.0, ans=0.2 2024-08-12 12:24:14,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1632610.0, ans=0.125 2024-08-12 12:24:16,233 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 12:24:16,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1632710.0, ans=0.5 2024-08-12 12:24:49,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1632910.0, ans=0.125 2024-08-12 12:24:49,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1632910.0, ans=0.0 2024-08-12 12:25:12,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1633010.0, ans=0.125 2024-08-12 12:25:22,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3900, loss[loss=0.08327, beats_loss=0.01095, ecapa_loss=0.0001818, whisper_loss=0.0705, over 16350.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01107, ecapa_loss=0.0001771, whisper_loss=0.09151, over 3888010.77 frames. ], batch size: 67, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:25:27,840 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 12:25:39,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.512e+01 2.803e+01 3.159e+01 7.102e+01, threshold=5.607e+01, percent-clipped=1.0 2024-08-12 12:25:58,637 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 12:26:00,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-12 12:26:10,881 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 12:26:11,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1633410.0, ans=0.1 2024-08-12 12:26:20,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1633410.0, ans=0.0 2024-08-12 12:26:41,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 3950, loss[loss=0.1016, beats_loss=0.01131, ecapa_loss=0.0001877, whisper_loss=0.0884, over 22329.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001778, whisper_loss=0.09214, over 3920768.84 frames. ], batch size: 92, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:26:57,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1633710.0, ans=0.125 2024-08-12 12:27:00,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1633710.0, ans=0.2 2024-08-12 12:27:02,341 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.435e+02 2024-08-12 12:27:10,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1633710.0, ans=0.125 2024-08-12 12:27:17,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1633810.0, ans=0.04949747468305833 2024-08-12 12:27:19,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1633810.0, ans=0.0 2024-08-12 12:27:28,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-08-12 12:27:29,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1633910.0, ans=0.1 2024-08-12 12:27:31,329 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-12 12:27:54,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=15.0 2024-08-12 12:28:00,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4000, loss[loss=0.121, beats_loss=0.008134, ecapa_loss=0.0001681, whisper_loss=0.1111, over 17927.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.000179, whisper_loss=0.09227, over 3920536.90 frames. ], batch size: 67, lr: 5.31e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:28:03,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1634110.0, ans=0.0 2024-08-12 12:28:16,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.174e+01 2.585e+01 2.882e+01 3.381e+01 6.617e+01, threshold=5.764e+01, percent-clipped=3.0 2024-08-12 12:28:17,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1634210.0, ans=0.2 2024-08-12 12:28:23,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1634210.0, ans=0.125 2024-08-12 12:28:29,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1634310.0, ans=0.0 2024-08-12 12:28:31,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1634310.0, ans=0.07 2024-08-12 12:28:31,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1634310.0, ans=0.1 2024-08-12 12:28:40,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1634310.0, ans=0.0 2024-08-12 12:28:53,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1634410.0, ans=0.0 2024-08-12 12:29:08,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1634510.0, ans=0.125 2024-08-12 12:29:18,913 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4050, loss[loss=0.08666, beats_loss=0.01033, ecapa_loss=0.0002008, whisper_loss=0.07433, over 20162.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01087, ecapa_loss=0.0001802, whisper_loss=0.0927, over 3909885.83 frames. ], batch size: 81, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:29:20,670 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 12:29:20,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1634610.0, ans=0.1 2024-08-12 12:29:28,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1634610.0, ans=0.125 2024-08-12 12:29:33,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1634710.0, ans=0.125 2024-08-12 12:29:41,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-12 12:29:53,539 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 12:29:58,769 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 12:30:19,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1634910.0, ans=0.0 2024-08-12 12:30:39,526 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4100, loss[loss=0.09746, beats_loss=0.00984, ecapa_loss=0.0002368, whisper_loss=0.08525, over 18882.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01087, ecapa_loss=0.0001812, whisper_loss=0.09257, over 3912490.42 frames. ], batch size: 79, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:30:41,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1635110.0, ans=0.0 2024-08-12 12:30:46,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1635110.0, ans=0.0 2024-08-12 12:30:50,733 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 12:30:56,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.058e+01 2.490e+01 2.729e+01 3.052e+01 9.662e+01, threshold=5.458e+01, percent-clipped=1.0 2024-08-12 12:31:05,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1635210.0, ans=0.125 2024-08-12 12:31:18,000 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 12:31:24,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1635310.0, ans=0.1 2024-08-12 12:31:27,662 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 12:31:29,116 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 12:31:31,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2024-08-12 12:32:00,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4150, loss[loss=0.09132, beats_loss=0.01188, ecapa_loss=0.0001609, whisper_loss=0.07783, over 22278.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01099, ecapa_loss=0.0001794, whisper_loss=0.09249, over 3907259.74 frames. ], batch size: 92, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:32:03,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.54 vs. limit=22.5 2024-08-12 12:32:12,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1635610.0, ans=0.125 2024-08-12 12:33:00,203 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 12:33:18,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-12 12:33:20,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4200, loss[loss=0.1119, beats_loss=0.009873, ecapa_loss=0.0001639, whisper_loss=0.1004, over 22757.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01092, ecapa_loss=0.0001794, whisper_loss=0.0924, over 3896631.93 frames. ], batch size: 89, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:33:25,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1636110.0, ans=0.125 2024-08-12 12:33:28,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1636110.0, ans=0.1 2024-08-12 12:33:37,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.467e+01 2.734e+01 3.043e+01 4.289e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 12:33:42,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1636210.0, ans=0.125 2024-08-12 12:34:01,096 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-12 12:34:06,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1636410.0, ans=10.0 2024-08-12 12:34:07,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1636410.0, ans=0.1 2024-08-12 12:34:11,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1636410.0, ans=0.0 2024-08-12 12:34:13,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1636410.0, ans=0.125 2024-08-12 12:34:39,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4250, loss[loss=0.08764, beats_loss=0.01213, ecapa_loss=0.0001704, whisper_loss=0.07381, over 22151.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.011, ecapa_loss=0.0001785, whisper_loss=0.09203, over 3906093.94 frames. ], batch size: 91, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:34:41,422 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 12:34:46,021 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 12:34:51,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-12 12:34:56,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1636710.0, ans=0.1 2024-08-12 12:34:58,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-08-12 12:35:02,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1636710.0, ans=10.0 2024-08-12 12:35:26,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.51 vs. limit=15.0 2024-08-12 12:35:35,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1636910.0, ans=0.0 2024-08-12 12:35:52,784 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 12:35:58,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4300, loss[loss=0.1223, beats_loss=0.009148, ecapa_loss=0.000177, whisper_loss=0.1114, over 21370.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01104, ecapa_loss=0.000178, whisper_loss=0.09153, over 3888089.68 frames. ], batch size: 82, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:36:02,054 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 12:36:05,235 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 12:36:09,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-12 12:36:15,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.507e+01 2.747e+01 3.144e+01 4.891e+01, threshold=5.494e+01, percent-clipped=0.0 2024-08-12 12:36:41,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1637310.0, ans=0.125 2024-08-12 12:37:10,999 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 12:37:16,933 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4350, loss[loss=0.1202, beats_loss=0.01157, ecapa_loss=0.0001432, whisper_loss=0.1072, over 23900.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01103, ecapa_loss=0.0001784, whisper_loss=0.09141, over 3899984.13 frames. ], batch size: 92, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:37:30,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1637610.0, ans=0.2 2024-08-12 12:37:37,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1637710.0, ans=0.1 2024-08-12 12:37:59,971 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 12:38:09,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1637910.0, ans=0.1 2024-08-12 12:38:27,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1638010.0, ans=0.125 2024-08-12 12:38:30,800 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-12 12:38:36,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4400, loss[loss=0.1089, beats_loss=0.01124, ecapa_loss=0.0001653, whisper_loss=0.09596, over 22430.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01107, ecapa_loss=0.0001781, whisper_loss=0.09123, over 3913888.05 frames. ], batch size: 89, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:38:42,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1638110.0, ans=0.0 2024-08-12 12:38:55,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.518e+01 2.794e+01 3.242e+01 9.315e+01, threshold=5.589e+01, percent-clipped=2.0 2024-08-12 12:39:11,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1638310.0, ans=0.2 2024-08-12 12:39:14,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2024-08-12 12:39:22,447 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 12:39:40,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1638410.0, ans=0.95 2024-08-12 12:39:44,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-12 12:39:50,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-12 12:39:59,148 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4450, loss[loss=0.1266, beats_loss=0.007479, ecapa_loss=0.0001818, whisper_loss=0.1173, over 18174.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.000177, whisper_loss=0.09164, over 3921336.51 frames. ], batch size: 67, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:40:06,798 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 12:40:07,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1638610.0, ans=0.125 2024-08-12 12:40:11,918 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.563e-03 2024-08-12 12:40:13,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1638710.0, ans=0.0 2024-08-12 12:40:31,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1638810.0, ans=0.0 2024-08-12 12:40:36,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1638810.0, ans=0.125 2024-08-12 12:40:49,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1638910.0, ans=0.125 2024-08-12 12:41:02,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1639010.0, ans=0.0 2024-08-12 12:41:19,611 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4500, loss[loss=0.09975, beats_loss=0.009669, ecapa_loss=0.0001995, whisper_loss=0.08809, over 16112.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001759, whisper_loss=0.09179, over 3907725.06 frames. ], batch size: 64, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:41:37,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.496e+01 3.007e+01 3.529e+01 6.889e+01, threshold=6.014e+01, percent-clipped=3.0 2024-08-12 12:41:41,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-12 12:42:02,293 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 12:42:04,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1639310.0, ans=0.04949747468305833 2024-08-12 12:42:05,335 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-12 12:42:14,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1639410.0, ans=0.125 2024-08-12 12:42:25,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2024-08-12 12:42:31,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1639510.0, ans=0.2 2024-08-12 12:42:38,279 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4550, loss[loss=0.1045, beats_loss=0.0126, ecapa_loss=0.0001725, whisper_loss=0.09018, over 16777.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01103, ecapa_loss=0.0001769, whisper_loss=0.09173, over 3901906.34 frames. ], batch size: 68, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:42:42,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1639610.0, ans=0.1 2024-08-12 12:42:47,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1639610.0, ans=0.0 2024-08-12 12:42:57,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1639710.0, ans=0.125 2024-08-12 12:43:02,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1639710.0, ans=0.1 2024-08-12 12:43:19,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1639810.0, ans=0.125 2024-08-12 12:43:33,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2024-08-12 12:43:37,071 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-164000.pt 2024-08-12 12:43:41,851 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.712e-01 2024-08-12 12:43:42,900 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 12:43:44,250 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 12:43:45,624 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 12:43:57,996 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4600, loss[loss=0.1125, beats_loss=0.01219, ecapa_loss=0.0001561, whisper_loss=0.09879, over 23853.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001767, whisper_loss=0.09168, over 3871501.15 frames. ], batch size: 92, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:44:01,305 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-12 12:44:08,750 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 12:44:14,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.449e+01 2.715e+01 3.086e+01 6.580e+01, threshold=5.431e+01, percent-clipped=1.0 2024-08-12 12:44:20,117 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 12:44:20,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-08-12 12:44:23,276 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 12:44:39,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1640310.0, ans=0.07 2024-08-12 12:45:07,847 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-12 12:45:15,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1640610.0, ans=0.125 2024-08-12 12:45:16,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4650, loss[loss=0.09663, beats_loss=0.01106, ecapa_loss=0.0001608, whisper_loss=0.08396, over 21937.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01105, ecapa_loss=0.0001784, whisper_loss=0.09104, over 3865183.09 frames. ], batch size: 87, lr: 5.30e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:45:25,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1640610.0, ans=0.2 2024-08-12 12:45:27,847 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 12:45:29,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1640610.0, ans=0.125 2024-08-12 12:45:37,913 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 12:45:42,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1640710.0, ans=0.0 2024-08-12 12:45:59,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1640810.0, ans=0.1 2024-08-12 12:46:00,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-08-12 12:46:14,625 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-12 12:46:29,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1641010.0, ans=0.0 2024-08-12 12:46:36,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4700, loss[loss=0.09765, beats_loss=0.01362, ecapa_loss=0.0001265, whisper_loss=0.08276, over 23271.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001778, whisper_loss=0.09184, over 3893007.05 frames. ], batch size: 91, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:46:54,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.062e+01 2.509e+01 2.776e+01 3.112e+01 6.525e+01, threshold=5.552e+01, percent-clipped=1.0 2024-08-12 12:46:59,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=12.0 2024-08-12 12:47:04,298 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 12:47:04,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1641210.0, ans=0.125 2024-08-12 12:47:07,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1641310.0, ans=0.125 2024-08-12 12:47:27,617 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.368e+05 2024-08-12 12:47:54,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4750, loss[loss=0.1006, beats_loss=0.01246, ecapa_loss=0.0001758, whisper_loss=0.08634, over 16864.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001789, whisper_loss=0.09185, over 3867009.96 frames. ], batch size: 68, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:47:55,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-12 12:48:04,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1641610.0, ans=0.1 2024-08-12 12:48:07,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1641610.0, ans=0.125 2024-08-12 12:48:16,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1641710.0, ans=0.05 2024-08-12 12:48:25,335 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 12:48:27,056 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 18 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-12 12:49:02,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1642010.0, ans=0.2 2024-08-12 12:49:10,816 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4800, loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001661, whisper_loss=0.08967, over 19170.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0111, ecapa_loss=0.0001792, whisper_loss=0.09078, over 3862850.49 frames. ], batch size: 76, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:49:20,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-12 12:49:27,504 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 36 from Vox, 30 fro AS 2024-08-12 12:49:28,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.489e+01 2.813e+01 3.178e+01 7.863e+01, threshold=5.627e+01, percent-clipped=2.0 2024-08-12 12:49:28,777 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 12:49:48,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1642310.0, ans=0.05 2024-08-12 12:49:50,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1642310.0, ans=0.125 2024-08-12 12:49:55,373 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 12:50:01,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1642410.0, ans=0.0 2024-08-12 12:50:04,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1642410.0, ans=0.125 2024-08-12 12:50:28,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4850, loss[loss=0.1069, beats_loss=0.01102, ecapa_loss=0.0001322, whisper_loss=0.09458, over 17129.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01106, ecapa_loss=0.0001786, whisper_loss=0.0912, over 3873777.84 frames. ], batch size: 64, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:50:51,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1642710.0, ans=0.125 2024-08-12 12:50:59,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1642810.0, ans=0.05 2024-08-12 12:51:07,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.51 vs. limit=10.0 2024-08-12 12:51:26,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1642910.0, ans=0.0 2024-08-12 12:51:29,007 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 12:51:47,251 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4900, loss[loss=0.1144, beats_loss=0.0102, ecapa_loss=0.0001805, whisper_loss=0.1024, over 14179.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001793, whisper_loss=0.09102, over 3853602.83 frames. ], batch size: 59, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:52:00,987 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 12:52:01,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1643210.0, ans=0.1 2024-08-12 12:52:03,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.565e+01 2.777e+01 3.230e+01 5.434e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 12:52:04,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1643210.0, ans=0.125 2024-08-12 12:52:13,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1643210.0, ans=0.2 2024-08-12 12:52:15,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1643210.0, ans=0.0 2024-08-12 12:52:22,857 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-12 12:52:45,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1643410.0, ans=0.0 2024-08-12 12:53:02,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 4950, loss[loss=0.1058, beats_loss=0.01039, ecapa_loss=0.0001643, whisper_loss=0.09379, over 16935.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01103, ecapa_loss=0.0001788, whisper_loss=0.09065, over 3843231.37 frames. ], batch size: 63, lr: 5.29e-03, grad_scale: 5.764607523034235e+17 2024-08-12 12:53:04,417 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 12:53:14,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1643610.0, ans=0.125 2024-08-12 12:53:26,563 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 12:53:28,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1643710.0, ans=0.0 2024-08-12 12:53:29,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1643710.0, ans=0.125 2024-08-12 12:53:43,852 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 36 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 12:53:49,293 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 12:53:57,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-12 12:53:59,403 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 12:54:14,975 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 12:54:20,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5000, loss[loss=0.07577, beats_loss=0.01376, ecapa_loss=0.0001596, whisper_loss=0.06041, over 16893.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001774, whisper_loss=0.0915, over 3832511.68 frames. ], batch size: 70, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:54:31,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1644110.0, ans=0.0 2024-08-12 12:54:36,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.085e+01 2.385e+01 2.734e+01 3.105e+01 6.733e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-12 12:54:37,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1644210.0, ans=0.125 2024-08-12 12:55:12,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1644410.0, ans=0.05 2024-08-12 12:55:16,106 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 12:55:20,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2024-08-12 12:55:37,807 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5050, loss[loss=0.1274, beats_loss=0.009699, ecapa_loss=0.0001674, whisper_loss=0.116, over 23042.00 frames. ], tot_loss[loss=0.105, beats_loss=0.011, ecapa_loss=0.0001785, whisper_loss=0.09225, over 3856788.03 frames. ], batch size: 89, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:55:42,548 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 41 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 12:55:52,906 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-12 12:55:57,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1644710.0, ans=0.1 2024-08-12 12:56:08,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1644810.0, ans=0.0 2024-08-12 12:56:25,148 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 12:56:27,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1644910.0, ans=0.125 2024-08-12 12:56:32,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.50 vs. limit=6.0 2024-08-12 12:56:56,007 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5100, loss[loss=0.09435, beats_loss=0.01306, ecapa_loss=0.0001692, whisper_loss=0.0796, over 21766.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01106, ecapa_loss=0.000179, whisper_loss=0.09161, over 3868275.88 frames. ], batch size: 89, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:57:00,390 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 12:57:13,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.597e+01 2.875e+01 3.428e+01 8.355e+01, threshold=5.751e+01, percent-clipped=1.0 2024-08-12 12:57:37,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1645310.0, ans=0.0 2024-08-12 12:57:38,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1645310.0, ans=0.0 2024-08-12 12:57:45,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1645410.0, ans=0.125 2024-08-12 12:57:49,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1645410.0, ans=0.125 2024-08-12 12:57:56,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1645510.0, ans=0.1 2024-08-12 12:58:08,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1645510.0, ans=0.1 2024-08-12 12:58:10,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=12.0 2024-08-12 12:58:12,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5150, loss[loss=0.1012, beats_loss=0.01186, ecapa_loss=0.0001673, whisper_loss=0.0877, over 19818.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01105, ecapa_loss=0.000178, whisper_loss=0.09178, over 3858188.87 frames. ], batch size: 78, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:58:12,535 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 12:58:26,141 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 12:58:32,567 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 11 from Vox, 51 fro AS 2024-08-12 12:58:45,806 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 12:58:53,166 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 12:59:01,728 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 12:59:04,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1645910.0, ans=0.2 2024-08-12 12:59:21,134 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.800e-02 2024-08-12 12:59:23,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1646110.0, ans=0.0 2024-08-12 12:59:23,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1646110.0, ans=0.1 2024-08-12 12:59:24,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5200, loss[loss=0.1078, beats_loss=0.0103, ecapa_loss=0.0001962, whisper_loss=0.09555, over 13187.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001771, whisper_loss=0.09184, over 3832515.16 frames. ], batch size: 54, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 12:59:26,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1646110.0, ans=0.0 2024-08-12 12:59:33,639 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 12:59:39,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.230e+01 2.626e+01 2.923e+01 3.403e+01 3.236e+02, threshold=5.847e+01, percent-clipped=1.0 2024-08-12 13:00:07,061 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-12 13:00:11,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1646410.0, ans=0.0 2024-08-12 13:00:14,703 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 13:00:30,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1646510.0, ans=0.125 2024-08-12 13:00:32,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5250, loss[loss=0.1033, beats_loss=0.01167, ecapa_loss=0.0001637, whisper_loss=0.08995, over 19491.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001775, whisper_loss=0.09189, over 3797374.94 frames. ], batch size: 78, lr: 5.29e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:00:36,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-12 13:00:55,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1646710.0, ans=0.025 2024-08-12 13:01:07,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-08-12 13:01:08,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1646810.0, ans=0.1 2024-08-12 13:01:11,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2024-08-12 13:01:24,934 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-12 13:01:38,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5300, loss[loss=0.09277, beats_loss=0.009095, ecapa_loss=0.0002078, whisper_loss=0.0816, over 13952.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001781, whisper_loss=0.09187, over 3800357.17 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:01:43,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-12 13:01:54,009 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.492e+01 2.766e+01 3.259e+01 2.039e+02, threshold=5.533e+01, percent-clipped=1.0 2024-08-12 13:02:24,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1647410.0, ans=0.125 2024-08-12 13:02:26,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1647410.0, ans=0.2 2024-08-12 13:02:43,812 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5350, loss[loss=0.1309, beats_loss=0.008745, ecapa_loss=0.0001954, whisper_loss=0.1202, over 22707.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001783, whisper_loss=0.09224, over 3828719.30 frames. ], batch size: 90, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:02:44,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1647610.0, ans=0.125 2024-08-12 13:02:58,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-08-12 13:03:02,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1647710.0, ans=0.2 2024-08-12 13:03:15,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1647810.0, ans=0.1 2024-08-12 13:03:16,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1647810.0, ans=0.1 2024-08-12 13:03:19,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1647810.0, ans=0.0 2024-08-12 13:03:46,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1648010.0, ans=0.125 2024-08-12 13:03:47,367 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 13:03:47,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-08-12 13:03:48,473 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5400, loss[loss=0.109, beats_loss=0.01099, ecapa_loss=0.0001713, whisper_loss=0.09629, over 15920.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01092, ecapa_loss=0.0001775, whisper_loss=0.092, over 3828064.24 frames. ], batch size: 63, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:03:50,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=12.0 2024-08-12 13:03:51,084 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 13:03:59,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1648110.0, ans=0.125 2024-08-12 13:04:04,398 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.540e+01 2.809e+01 3.411e+01 5.713e+01, threshold=5.618e+01, percent-clipped=1.0 2024-08-12 13:04:11,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1648210.0, ans=0.125 2024-08-12 13:04:12,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1648210.0, ans=0.125 2024-08-12 13:04:20,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1648310.0, ans=0.0 2024-08-12 13:04:26,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1648310.0, ans=0.025 2024-08-12 13:04:30,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2024-08-12 13:04:32,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1648410.0, ans=0.1 2024-08-12 13:04:51,784 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 13:04:54,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5450, loss[loss=0.09945, beats_loss=0.01207, ecapa_loss=0.0001592, whisper_loss=0.08578, over 19716.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01093, ecapa_loss=0.0001786, whisper_loss=0.09212, over 3835191.01 frames. ], batch size: 80, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:05:09,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1648710.0, ans=0.0 2024-08-12 13:05:15,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1648710.0, ans=0.2 2024-08-12 13:05:30,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2024-08-12 13:05:45,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2024-08-12 13:05:58,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=15.0 2024-08-12 13:05:59,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5500, loss[loss=0.09277, beats_loss=0.01148, ecapa_loss=0.000136, whisper_loss=0.07993, over 17284.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001768, whisper_loss=0.09155, over 3830455.09 frames. ], batch size: 66, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:06:12,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1649210.0, ans=0.125 2024-08-12 13:06:15,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.546e+01 2.808e+01 3.382e+01 4.653e+01, threshold=5.615e+01, percent-clipped=0.0 2024-08-12 13:06:16,896 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 13:06:35,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-12 13:06:37,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.67 vs. limit=10.0 2024-08-12 13:06:46,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-08-12 13:06:47,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1649410.0, ans=0.125 2024-08-12 13:06:48,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1649410.0, ans=0.125 2024-08-12 13:06:49,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1649410.0, ans=0.0 2024-08-12 13:06:59,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1649510.0, ans=0.015 2024-08-12 13:07:05,585 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5550, loss[loss=0.1187, beats_loss=0.01041, ecapa_loss=0.0001618, whisper_loss=0.1067, over 23679.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01097, ecapa_loss=0.0001766, whisper_loss=0.09199, over 3852354.17 frames. ], batch size: 94, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:07:06,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1649610.0, ans=0.125 2024-08-12 13:07:08,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1649610.0, ans=0.1 2024-08-12 13:07:14,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=15.0 2024-08-12 13:07:28,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1649710.0, ans=0.125 2024-08-12 13:07:29,655 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 13:07:49,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1649910.0, ans=0.125 2024-08-12 13:08:26,060 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5600, loss[loss=0.08638, beats_loss=0.007736, ecapa_loss=0.0002077, whisper_loss=0.07656, over 13530.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001764, whisper_loss=0.09203, over 3856167.74 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:08:27,298 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 13:08:38,197 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-12 13:08:45,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-12 13:08:51,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.532e+01 2.834e+01 3.138e+01 6.030e+01, threshold=5.668e+01, percent-clipped=1.0 2024-08-12 13:08:59,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2024-08-12 13:09:20,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.26 vs. limit=15.0 2024-08-12 13:09:24,486 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 13:09:34,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-12 13:09:35,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1650410.0, ans=0.1 2024-08-12 13:09:39,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1650410.0, ans=0.125 2024-08-12 13:09:53,729 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 13:09:55,144 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:09:55,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5650, loss[loss=0.1123, beats_loss=0.009723, ecapa_loss=0.0002293, whisper_loss=0.1003, over 17480.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01097, ecapa_loss=0.0001758, whisper_loss=0.09212, over 3854984.51 frames. ], batch size: 73, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:10:04,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2024-08-12 13:10:33,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1650810.0, ans=0.125 2024-08-12 13:10:39,173 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 13:10:53,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-08-12 13:10:54,064 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 13:11:13,200 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5700, loss[loss=0.09743, beats_loss=0.01046, ecapa_loss=0.0002033, whisper_loss=0.08494, over 19564.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01096, ecapa_loss=0.0001763, whisper_loss=0.09254, over 3892923.42 frames. ], batch size: 84, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:11:16,192 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-12 13:11:25,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1651110.0, ans=0.125 2024-08-12 13:11:27,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.72 vs. limit=10.0 2024-08-12 13:11:28,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1651210.0, ans=0.5 2024-08-12 13:11:31,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.530e+01 2.812e+01 3.253e+01 9.696e+01, threshold=5.623e+01, percent-clipped=1.0 2024-08-12 13:11:41,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1651210.0, ans=0.07 2024-08-12 13:11:51,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1651310.0, ans=0.0 2024-08-12 13:11:52,261 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.375e-01 2024-08-12 13:12:20,361 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 13:12:30,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5750, loss[loss=0.09358, beats_loss=0.009272, ecapa_loss=0.0002048, whisper_loss=0.08226, over 15534.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01096, ecapa_loss=0.0001752, whisper_loss=0.0923, over 3896688.62 frames. ], batch size: 64, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:12:52,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1651710.0, ans=0.0 2024-08-12 13:13:13,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1651810.0, ans=0.125 2024-08-12 13:13:25,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1651910.0, ans=0.125 2024-08-12 13:13:45,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5800, loss[loss=0.1222, beats_loss=0.008786, ecapa_loss=0.0002322, whisper_loss=0.1111, over 21775.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001782, whisper_loss=0.09159, over 3900908.63 frames. ], batch size: 90, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:14:04,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.480e+01 2.682e+01 3.175e+01 5.563e+01, threshold=5.365e+01, percent-clipped=0.0 2024-08-12 13:14:15,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1652310.0, ans=0.125 2024-08-12 13:14:16,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1652310.0, ans=0.125 2024-08-12 13:14:28,757 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:14:33,069 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 13:14:50,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1652510.0, ans=0.0 2024-08-12 13:14:57,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2024-08-12 13:15:02,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1652510.0, ans=0.125 2024-08-12 13:15:05,758 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5850, loss[loss=0.106, beats_loss=0.01147, ecapa_loss=0.0001689, whisper_loss=0.09287, over 22105.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001774, whisper_loss=0.09142, over 3914148.50 frames. ], batch size: 88, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:15:10,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1652610.0, ans=0.125 2024-08-12 13:15:19,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1652610.0, ans=0.125 2024-08-12 13:15:23,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1652710.0, ans=0.0 2024-08-12 13:15:23,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.11 vs. limit=22.5 2024-08-12 13:15:54,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-12 13:16:20,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1653010.0, ans=0.125 2024-08-12 13:16:25,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5900, loss[loss=0.08484, beats_loss=0.01305, ecapa_loss=0.0001575, whisper_loss=0.07021, over 17774.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001774, whisper_loss=0.09184, over 3925582.58 frames. ], batch size: 71, lr: 5.28e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:16:25,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1653110.0, ans=0.2 2024-08-12 13:16:26,716 INFO [train_multi_KD3.py:844] (0/4) A total of 98 cuts. 21 from LS+wenet, 24 from Vox, 53 fro AS 2024-08-12 13:16:31,324 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 13:16:43,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.413e+01 2.728e+01 2.999e+01 4.140e+01, threshold=5.456e+01, percent-clipped=0.0 2024-08-12 13:16:58,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1653310.0, ans=0.2 2024-08-12 13:17:21,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=12.0 2024-08-12 13:17:42,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-12 13:17:43,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 5950, loss[loss=0.1015, beats_loss=0.01196, ecapa_loss=0.000154, whisper_loss=0.08803, over 14349.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01105, ecapa_loss=0.0001781, whisper_loss=0.09126, over 3883457.83 frames. ], batch size: 58, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:18:01,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1653710.0, ans=0.0 2024-08-12 13:18:11,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1653710.0, ans=0.1 2024-08-12 13:18:16,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-12 13:18:41,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1653910.0, ans=0.0 2024-08-12 13:18:47,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1654010.0, ans=0.125 2024-08-12 13:18:49,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-08-12 13:18:58,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1654010.0, ans=0.125 2024-08-12 13:19:03,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6000, loss[loss=0.1198, beats_loss=0.01028, ecapa_loss=0.0001529, whisper_loss=0.108, over 15778.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001765, whisper_loss=0.09166, over 3889437.62 frames. ], batch size: 62, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:19:03,592 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 13:19:40,036 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005888, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 13:19:58,366 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on SV_voxceleb1: loss=0.004729, beats_loss=0, ecapa_loss=0.0004729, whisper_loss=0, over 939242.00 frames. 2024-08-12 13:21:43,828 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on AT_audioset: loss=0.02432, beats_loss=0.02432, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 13:21:43,833 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 13:21:54,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.16 vs. limit=10.0 2024-08-12 13:21:57,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1654110.0, ans=0.0 2024-08-12 13:22:00,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1654210.0, ans=0.0 2024-08-12 13:22:03,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.605e+01 2.854e+01 3.270e+01 6.510e+01, threshold=5.707e+01, percent-clipped=1.0 2024-08-12 13:22:09,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1654210.0, ans=0.2 2024-08-12 13:22:15,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1654310.0, ans=0.125 2024-08-12 13:22:23,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1654310.0, ans=0.125 2024-08-12 13:22:28,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1654310.0, ans=0.2 2024-08-12 13:22:33,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1654410.0, ans=0.125 2024-08-12 13:22:52,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1654510.0, ans=0.0 2024-08-12 13:22:52,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1654510.0, ans=0.0 2024-08-12 13:23:00,718 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 13:23:02,943 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6050, loss[loss=0.1038, beats_loss=0.01019, ecapa_loss=0.0001588, whisper_loss=0.09201, over 18742.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001766, whisper_loss=0.0916, over 3880829.56 frames. ], batch size: 73, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:23:18,322 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-12 13:23:18,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1654710.0, ans=0.1 2024-08-12 13:23:35,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1654810.0, ans=0.1 2024-08-12 13:23:42,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1654810.0, ans=0.0 2024-08-12 13:23:43,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=15.0 2024-08-12 13:23:50,431 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 13:23:52,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1654910.0, ans=0.2 2024-08-12 13:23:57,924 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 13:23:58,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1654910.0, ans=0.2 2024-08-12 13:24:08,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1655010.0, ans=0.125 2024-08-12 13:24:10,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1655010.0, ans=0.0 2024-08-12 13:24:10,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1655010.0, ans=0.1 2024-08-12 13:24:23,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6100, loss[loss=0.1215, beats_loss=0.01142, ecapa_loss=0.0001861, whisper_loss=0.1082, over 22830.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01107, ecapa_loss=0.0001775, whisper_loss=0.09175, over 3889690.29 frames. ], batch size: 94, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:24:23,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1655110.0, ans=0.05 2024-08-12 13:24:23,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1655110.0, ans=0.5 2024-08-12 13:24:27,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1655110.0, ans=0.0 2024-08-12 13:24:34,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-12 13:24:38,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1655210.0, ans=0.125 2024-08-12 13:24:42,404 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+01 2.433e+01 2.687e+01 2.996e+01 4.596e+01, threshold=5.373e+01, percent-clipped=0.0 2024-08-12 13:24:45,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1655210.0, ans=0.125 2024-08-12 13:25:04,891 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 13:25:20,673 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 13:25:20,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1655410.0, ans=0.0 2024-08-12 13:25:21,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-08-12 13:25:30,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1655510.0, ans=0.0 2024-08-12 13:25:36,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1655510.0, ans=0.025 2024-08-12 13:25:38,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1655510.0, ans=0.0 2024-08-12 13:25:42,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6150, loss[loss=0.1009, beats_loss=0.009649, ecapa_loss=0.0001655, whisper_loss=0.08955, over 15335.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01104, ecapa_loss=0.000178, whisper_loss=0.09169, over 3900552.11 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:25:51,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1655610.0, ans=0.0 2024-08-12 13:25:59,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1655710.0, ans=0.125 2024-08-12 13:26:19,647 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 13:26:24,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2024-08-12 13:26:26,755 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-12 13:26:47,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-12 13:26:57,711 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 13:27:01,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6200, loss[loss=0.1092, beats_loss=0.008826, ecapa_loss=0.0001504, whisper_loss=0.09889, over 16255.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.000178, whisper_loss=0.09185, over 3862803.56 frames. ], batch size: 60, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:27:17,052 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 13:27:21,058 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.566e+01 2.980e+01 3.459e+01 1.302e+02, threshold=5.960e+01, percent-clipped=3.0 2024-08-12 13:27:41,548 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 13:27:55,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1656410.0, ans=0.1 2024-08-12 13:27:57,452 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 35 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 13:28:13,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1656510.0, ans=0.125 2024-08-12 13:28:20,358 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6250, loss[loss=0.105, beats_loss=0.008318, ecapa_loss=0.000182, whisper_loss=0.0949, over 17513.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01108, ecapa_loss=0.0001761, whisper_loss=0.09102, over 3892041.21 frames. ], batch size: 68, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:28:22,417 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 13:28:31,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1656610.0, ans=0.1 2024-08-12 13:28:35,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1656710.0, ans=0.125 2024-08-12 13:28:54,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-08-12 13:29:01,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1656810.0, ans=0.125 2024-08-12 13:29:18,773 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 13:29:34,232 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 13:29:38,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6300, loss[loss=0.07205, beats_loss=0.01454, ecapa_loss=0.000136, whisper_loss=0.05615, over 15446.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01105, ecapa_loss=0.0001771, whisper_loss=0.09162, over 3884737.80 frames. ], batch size: 61, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:29:40,234 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 13:29:47,960 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 13:29:48,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1657110.0, ans=0.0 2024-08-12 13:29:56,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.136e+01 2.533e+01 2.764e+01 3.173e+01 6.844e+01, threshold=5.528e+01, percent-clipped=1.0 2024-08-12 13:30:00,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2024-08-12 13:30:06,067 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-12 13:30:07,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1657310.0, ans=0.125 2024-08-12 13:30:15,338 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-12 13:30:15,630 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.829e+00 2024-08-12 13:30:38,202 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 13:30:38,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1657510.0, ans=0.0 2024-08-12 13:30:40,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1657510.0, ans=0.0 2024-08-12 13:30:53,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1657610.0, ans=0.125 2024-08-12 13:30:54,842 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6350, loss[loss=0.09552, beats_loss=0.01205, ecapa_loss=0.0001521, whisper_loss=0.08195, over 17026.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01103, ecapa_loss=0.0001773, whisper_loss=0.09167, over 3867248.28 frames. ], batch size: 67, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:31:02,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=12.0 2024-08-12 13:31:06,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1657610.0, ans=0.0 2024-08-12 13:31:10,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1657710.0, ans=0.0 2024-08-12 13:31:12,229 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 26 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 13:31:27,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2024-08-12 13:31:39,161 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 13:31:43,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1657910.0, ans=0.0 2024-08-12 13:31:51,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1657910.0, ans=0.1 2024-08-12 13:31:54,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2024-08-12 13:32:11,068 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6400, loss[loss=0.1035, beats_loss=0.01271, ecapa_loss=0.0001275, whisper_loss=0.08955, over 14902.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001771, whisper_loss=0.09252, over 3844592.63 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:32:29,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.462e+01 2.715e+01 3.060e+01 4.478e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 13:32:35,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1658210.0, ans=0.5 2024-08-12 13:32:42,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1658310.0, ans=0.95 2024-08-12 13:32:45,759 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 38 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 13:32:57,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=15.0 2024-08-12 13:32:59,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1658410.0, ans=0.95 2024-08-12 13:33:01,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1658410.0, ans=0.0 2024-08-12 13:33:04,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1658410.0, ans=0.0 2024-08-12 13:33:12,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=15.0 2024-08-12 13:33:22,558 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-12 13:33:26,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6450, loss[loss=0.1045, beats_loss=0.01013, ecapa_loss=0.0001393, whisper_loss=0.09297, over 20765.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01094, ecapa_loss=0.0001764, whisper_loss=0.09298, over 3894744.26 frames. ], batch size: 78, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:33:37,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1658610.0, ans=0.1 2024-08-12 13:33:49,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-08-12 13:34:00,954 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 13:34:07,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-08-12 13:34:15,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1658910.0, ans=0.0 2024-08-12 13:34:18,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1658910.0, ans=0.125 2024-08-12 13:34:33,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1659010.0, ans=0.1 2024-08-12 13:34:41,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6500, loss[loss=0.0971, beats_loss=0.01181, ecapa_loss=0.0002031, whisper_loss=0.08325, over 21216.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01099, ecapa_loss=0.0001764, whisper_loss=0.09262, over 3878747.74 frames. ], batch size: 91, lr: 5.27e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:34:41,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1659110.0, ans=0.0 2024-08-12 13:34:49,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1659110.0, ans=0.1 2024-08-12 13:34:54,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-12 13:34:58,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+01 2.642e+01 2.943e+01 3.228e+01 1.281e+02, threshold=5.885e+01, percent-clipped=1.0 2024-08-12 13:35:15,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1659310.0, ans=0.125 2024-08-12 13:35:18,268 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 13:35:20,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1659310.0, ans=0.0 2024-08-12 13:35:20,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1659310.0, ans=0.1 2024-08-12 13:35:30,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1659410.0, ans=0.1 2024-08-12 13:35:48,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1659510.0, ans=0.2 2024-08-12 13:35:55,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6550, loss[loss=0.1056, beats_loss=0.01084, ecapa_loss=0.0001743, whisper_loss=0.09297, over 15049.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01103, ecapa_loss=0.0001778, whisper_loss=0.0923, over 3855051.24 frames. ], batch size: 60, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:36:00,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1659610.0, ans=0.125 2024-08-12 13:36:03,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.11 vs. limit=22.5 2024-08-12 13:36:21,237 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 13:36:32,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1659810.0, ans=0.125 2024-08-12 13:36:33,557 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 13:36:43,818 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 13:36:58,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1660010.0, ans=0.0 2024-08-12 13:37:03,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1660010.0, ans=0.0 2024-08-12 13:37:10,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6600, loss[loss=0.1195, beats_loss=0.01027, ecapa_loss=0.0001813, whisper_loss=0.1074, over 14888.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01096, ecapa_loss=0.0001783, whisper_loss=0.09308, over 3912128.62 frames. ], batch size: 58, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:37:10,865 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 13:37:15,095 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-12 13:37:19,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1660110.0, ans=0.125 2024-08-12 13:37:22,532 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-12 13:37:28,527 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.664e+01 3.043e+01 3.449e+01 7.276e+01, threshold=6.087e+01, percent-clipped=1.0 2024-08-12 13:37:31,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1660210.0, ans=0.1 2024-08-12 13:37:33,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1660210.0, ans=0.2 2024-08-12 13:37:39,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2024-08-12 13:37:46,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1660310.0, ans=0.125 2024-08-12 13:37:47,549 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 17 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 13:37:57,872 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 15 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 13:38:23,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6650, loss[loss=0.1059, beats_loss=0.01332, ecapa_loss=0.0001396, whisper_loss=0.09118, over 22528.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01097, ecapa_loss=0.0001777, whisper_loss=0.09281, over 3902382.62 frames. ], batch size: 90, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:38:25,097 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-12 13:38:38,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1660710.0, ans=0.1 2024-08-12 13:38:57,235 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 13:38:58,730 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 13:39:05,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-12 13:39:14,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-08-12 13:39:15,042 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 13:39:27,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1661010.0, ans=0.2 2024-08-12 13:39:29,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=12.0 2024-08-12 13:39:35,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6700, loss[loss=0.0902, beats_loss=0.01331, ecapa_loss=0.0001239, whisper_loss=0.07565, over 16547.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01101, ecapa_loss=0.000176, whisper_loss=0.09292, over 3895153.74 frames. ], batch size: 64, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:39:43,878 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 13:39:46,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1661110.0, ans=0.125 2024-08-12 13:39:50,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1661210.0, ans=0.2 2024-08-12 13:39:52,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.059e+01 2.550e+01 2.931e+01 3.277e+01 4.693e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 13:39:58,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1661210.0, ans=0.125 2024-08-12 13:40:12,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1661310.0, ans=0.1 2024-08-12 13:40:15,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1661310.0, ans=0.125 2024-08-12 13:40:16,614 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 13:40:40,366 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 13:40:46,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1661610.0, ans=0.125 2024-08-12 13:40:47,696 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6750, loss[loss=0.09835, beats_loss=0.01405, ecapa_loss=0.0001394, whisper_loss=0.08291, over 23258.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01103, ecapa_loss=0.0001757, whisper_loss=0.09283, over 3876155.97 frames. ], batch size: 93, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:40:54,471 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-12 13:40:57,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1661610.0, ans=0.0 2024-08-12 13:40:59,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1661610.0, ans=0.125 2024-08-12 13:41:07,768 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 13:41:12,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1661710.0, ans=0.0 2024-08-12 13:41:13,442 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 13:41:37,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1661910.0, ans=0.125 2024-08-12 13:41:39,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1661910.0, ans=0.125 2024-08-12 13:41:40,694 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 13:41:58,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6800, loss[loss=0.09808, beats_loss=0.01259, ecapa_loss=0.000208, whisper_loss=0.0834, over 22230.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01095, ecapa_loss=0.0001757, whisper_loss=0.09298, over 3860508.23 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:42:03,936 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-12 13:42:06,889 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 13:42:12,310 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 13:42:14,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.516e+01 2.723e+01 3.027e+01 3.885e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 13:42:17,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1662210.0, ans=0.015 2024-08-12 13:42:20,721 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 13:42:26,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1662310.0, ans=0.1 2024-08-12 13:42:27,776 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 13:42:38,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2024-08-12 13:42:49,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.71 vs. limit=22.5 2024-08-12 13:42:54,108 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 13:42:54,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1662510.0, ans=0.2 2024-08-12 13:43:08,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6850, loss[loss=0.1026, beats_loss=0.01177, ecapa_loss=0.0001912, whisper_loss=0.08897, over 21071.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01101, ecapa_loss=0.0001749, whisper_loss=0.09191, over 3847013.36 frames. ], batch size: 90, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:43:42,635 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-12 13:43:44,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1662810.0, ans=0.125 2024-08-12 13:43:52,743 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 13:43:55,876 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.459e-02 2024-08-12 13:43:59,561 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 13:44:05,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1663010.0, ans=0.2 2024-08-12 13:44:06,715 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 13:44:12,328 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 13:44:17,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1663010.0, ans=0.0 2024-08-12 13:44:19,366 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6900, loss[loss=0.08466, beats_loss=0.008498, ecapa_loss=0.0001988, whisper_loss=0.07417, over 14110.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01108, ecapa_loss=0.0001749, whisper_loss=0.09148, over 3851957.10 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:44:29,285 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-12 13:44:35,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.437e+01 2.665e+01 2.983e+01 5.492e+01, threshold=5.330e+01, percent-clipped=1.0 2024-08-12 13:44:38,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-12 13:44:42,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1663210.0, ans=0.0 2024-08-12 13:44:44,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1663210.0, ans=0.125 2024-08-12 13:45:00,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1663410.0, ans=0.0 2024-08-12 13:45:29,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 6950, loss[loss=0.09949, beats_loss=0.01203, ecapa_loss=0.0002191, whisper_loss=0.08527, over 17748.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01105, ecapa_loss=0.0001749, whisper_loss=0.09216, over 3862043.30 frames. ], batch size: 76, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:45:40,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1663610.0, ans=0.125 2024-08-12 13:45:47,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=1663710.0, ans=12.0 2024-08-12 13:45:54,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2024-08-12 13:45:55,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1663710.0, ans=0.0 2024-08-12 13:46:16,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1663910.0, ans=0.125 2024-08-12 13:46:18,304 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 13:46:26,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1664010.0, ans=0.0 2024-08-12 13:46:37,722 INFO [train_multi_KD3.py:844] (0/4) A total of 100 cuts. 27 from LS+wenet, 21 from Vox, 52 fro AS 2024-08-12 13:46:42,869 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7000, loss[loss=0.1003, beats_loss=0.01209, ecapa_loss=0.000186, whisper_loss=0.08636, over 21770.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0111, ecapa_loss=0.0001756, whisper_loss=0.09225, over 3873401.01 frames. ], batch size: 87, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:46:46,231 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 13:46:47,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1664110.0, ans=0.125 2024-08-12 13:46:52,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1664110.0, ans=0.0 2024-08-12 13:46:52,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1664110.0, ans=0.125 2024-08-12 13:46:52,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=1664110.0, ans=22.5 2024-08-12 13:47:00,007 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.453e+01 2.767e+01 3.334e+01 1.862e+02, threshold=5.533e+01, percent-clipped=4.0 2024-08-12 13:47:04,789 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 13:47:15,010 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 13:47:23,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1664310.0, ans=0.125 2024-08-12 13:47:26,453 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 13:47:35,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1664410.0, ans=0.125 2024-08-12 13:47:51,310 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 13:47:55,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7050, loss[loss=0.1221, beats_loss=0.01168, ecapa_loss=0.000159, whisper_loss=0.1088, over 18293.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0111, ecapa_loss=0.0001752, whisper_loss=0.09202, over 3893723.66 frames. ], batch size: 70, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:48:13,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=12.0 2024-08-12 13:48:23,401 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-12 13:48:32,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.36 vs. limit=10.0 2024-08-12 13:48:50,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-12 13:48:54,396 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 13:49:09,043 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7100, loss[loss=0.08972, beats_loss=0.01227, ecapa_loss=0.0001607, whisper_loss=0.07584, over 17623.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.0001754, whisper_loss=0.09214, over 3877812.40 frames. ], batch size: 73, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:49:14,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1665110.0, ans=0.0 2024-08-12 13:49:19,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1665110.0, ans=0.1 2024-08-12 13:49:25,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.512e+01 2.817e+01 3.133e+01 5.318e+01, threshold=5.634e+01, percent-clipped=0.0 2024-08-12 13:50:01,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1665410.0, ans=0.1 2024-08-12 13:50:06,429 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 13:50:09,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1665510.0, ans=0.0 2024-08-12 13:50:14,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1665510.0, ans=0.1 2024-08-12 13:50:22,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7150, loss[loss=0.09986, beats_loss=0.01164, ecapa_loss=0.0002151, whisper_loss=0.08607, over 22679.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01105, ecapa_loss=0.0001762, whisper_loss=0.09169, over 3906606.03 frames. ], batch size: 95, lr: 5.26e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:51:06,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1665910.0, ans=0.125 2024-08-12 13:51:14,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1665910.0, ans=0.1 2024-08-12 13:51:23,103 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 13:51:31,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1666010.0, ans=0.125 2024-08-12 13:51:36,167 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7200, loss[loss=0.09665, beats_loss=0.01293, ecapa_loss=0.0001491, whisper_loss=0.08223, over 23107.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01118, ecapa_loss=0.0001745, whisper_loss=0.09075, over 3920642.25 frames. ], batch size: 92, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:51:39,393 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.869e-02 2024-08-12 13:51:46,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1666110.0, ans=0.0 2024-08-12 13:51:48,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1666110.0, ans=0.2 2024-08-12 13:51:53,627 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.582e+01 2.995e+01 3.267e+01 4.717e+01, threshold=5.989e+01, percent-clipped=0.0 2024-08-12 13:52:15,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1666310.0, ans=0.0 2024-08-12 13:52:32,094 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 13:52:48,658 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7250, loss[loss=0.1183, beats_loss=0.01037, ecapa_loss=0.000185, whisper_loss=0.1061, over 22120.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01114, ecapa_loss=0.000175, whisper_loss=0.09059, over 3919277.88 frames. ], batch size: 87, lr: 5.25e-03, grad_scale: 5.764607523034235e+17 2024-08-12 13:52:55,144 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 13:52:58,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1666610.0, ans=10.0 2024-08-12 13:53:03,477 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 13:53:03,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1666710.0, ans=0.125 2024-08-12 13:53:21,643 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 13:53:26,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1666810.0, ans=0.125 2024-08-12 13:53:31,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1666810.0, ans=0.125 2024-08-12 13:53:40,034 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 13:53:55,047 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 12 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 13:54:02,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1667010.0, ans=0.125 2024-08-12 13:54:05,188 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7300, loss[loss=0.1015, beats_loss=0.01106, ecapa_loss=0.0001865, whisper_loss=0.08859, over 19951.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01108, ecapa_loss=0.0001751, whisper_loss=0.09139, over 3912784.42 frames. ], batch size: 81, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:54:24,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.453e+01 2.736e+01 3.058e+01 4.580e+01, threshold=5.471e+01, percent-clipped=0.0 2024-08-12 13:54:41,435 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 13:54:46,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1667310.0, ans=0.125 2024-08-12 13:54:55,365 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 13:54:59,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-12 13:55:02,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2024-08-12 13:55:11,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1667510.0, ans=0.1 2024-08-12 13:55:17,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1667510.0, ans=0.1 2024-08-12 13:55:23,502 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7350, loss[loss=0.09743, beats_loss=0.01046, ecapa_loss=0.0001822, whisper_loss=0.08515, over 16191.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01106, ecapa_loss=0.0001771, whisper_loss=0.09069, over 3895259.30 frames. ], batch size: 66, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:55:25,722 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 13:55:26,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=12.0 2024-08-12 13:55:34,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1667610.0, ans=0.2 2024-08-12 13:55:52,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1667810.0, ans=0.0 2024-08-12 13:55:52,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1667810.0, ans=0.0 2024-08-12 13:55:59,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1667810.0, ans=0.0 2024-08-12 13:56:08,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-12 13:56:16,297 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 13:56:20,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1667910.0, ans=0.1 2024-08-12 13:56:21,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1667910.0, ans=0.125 2024-08-12 13:56:22,799 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 25 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-12 13:56:25,748 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 13:56:41,158 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7400, loss[loss=0.06645, beats_loss=0.01419, ecapa_loss=0.0001684, whisper_loss=0.05058, over 18161.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01113, ecapa_loss=0.0001771, whisper_loss=0.09019, over 3873731.12 frames. ], batch size: 78, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:56:44,488 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 13:56:58,736 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.596e+01 2.915e+01 3.233e+01 4.650e+01, threshold=5.831e+01, percent-clipped=0.0 2024-08-12 13:57:25,996 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-12 13:57:32,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2024-08-12 13:57:33,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1668410.0, ans=0.125 2024-08-12 13:57:38,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-12 13:57:41,292 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 13:57:49,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-12 13:57:51,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2024-08-12 13:57:52,050 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 13:57:54,787 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 13:57:55,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7450, loss[loss=0.1118, beats_loss=0.01118, ecapa_loss=0.0001338, whisper_loss=0.09925, over 17210.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01103, ecapa_loss=0.0001762, whisper_loss=0.09133, over 3907145.86 frames. ], batch size: 65, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:58:01,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1668610.0, ans=0.0 2024-08-12 13:58:21,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=1668710.0, ans=15.0 2024-08-12 13:58:45,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-08-12 13:58:52,772 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 13:58:58,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2024-08-12 13:59:12,854 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7500, loss[loss=0.09002, beats_loss=0.01224, ecapa_loss=0.0002154, whisper_loss=0.07563, over 17895.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.011, ecapa_loss=0.0001765, whisper_loss=0.09115, over 3917583.62 frames. ], batch size: 75, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 13:59:30,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.570e+01 2.878e+01 3.293e+01 5.497e+01, threshold=5.755e+01, percent-clipped=0.0 2024-08-12 13:59:36,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.14 vs. limit=22.5 2024-08-12 13:59:45,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1669310.0, ans=0.125 2024-08-12 13:59:52,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1669310.0, ans=0.125 2024-08-12 14:00:02,845 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 14:00:04,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1669410.0, ans=0.125 2024-08-12 14:00:07,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1669410.0, ans=0.0 2024-08-12 14:00:15,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1669510.0, ans=0.125 2024-08-12 14:00:16,110 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-12 14:00:26,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7550, loss[loss=0.09892, beats_loss=0.01172, ecapa_loss=0.000196, whisper_loss=0.08524, over 15419.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01105, ecapa_loss=0.0001763, whisper_loss=0.09124, over 3906864.45 frames. ], batch size: 63, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:00:34,567 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 14:00:36,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1669610.0, ans=0.2 2024-08-12 14:00:42,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1669710.0, ans=0.125 2024-08-12 14:00:52,024 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.931e-01 2024-08-12 14:00:54,415 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 14:00:54,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1669810.0, ans=0.05 2024-08-12 14:00:56,038 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-12 14:00:57,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1669810.0, ans=0.125 2024-08-12 14:01:20,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1669910.0, ans=0.0 2024-08-12 14:01:22,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1669910.0, ans=0.1 2024-08-12 14:01:33,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1670010.0, ans=0.125 2024-08-12 14:01:35,095 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 14:01:35,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1670010.0, ans=0.125 2024-08-12 14:01:41,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7600, loss[loss=0.1153, beats_loss=0.0117, ecapa_loss=0.0001708, whisper_loss=0.1019, over 14873.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01102, ecapa_loss=0.0001769, whisper_loss=0.09104, over 3857319.83 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:01:44,016 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 14:01:47,549 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 14:01:52,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=12.0 2024-08-12 14:01:59,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.500e+01 2.707e+01 3.102e+01 5.200e+01, threshold=5.414e+01, percent-clipped=0.0 2024-08-12 14:02:22,361 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:02:22,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.01 vs. limit=10.0 2024-08-12 14:02:29,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1670410.0, ans=0.1 2024-08-12 14:02:46,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.39 vs. limit=15.0 2024-08-12 14:02:54,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1670510.0, ans=0.125 2024-08-12 14:02:56,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1670610.0, ans=0.2 2024-08-12 14:02:57,294 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7650, loss[loss=0.0824, beats_loss=0.01137, ecapa_loss=0.0002158, whisper_loss=0.06887, over 22248.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01095, ecapa_loss=0.0001775, whisper_loss=0.09112, over 3853217.70 frames. ], batch size: 96, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:02:57,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=1670610.0, ans=15.0 2024-08-12 14:03:25,065 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 18 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 14:03:32,419 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 14:03:43,023 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 14:03:44,364 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 14:04:06,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1671010.0, ans=0.0 2024-08-12 14:04:14,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7700, loss[loss=0.1033, beats_loss=0.01163, ecapa_loss=0.0001705, whisper_loss=0.08994, over 22866.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001771, whisper_loss=0.09153, over 3850442.94 frames. ], batch size: 91, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:04:16,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1671110.0, ans=0.2 2024-08-12 14:04:26,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1671110.0, ans=0.1 2024-08-12 14:04:29,352 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 21 from LS+wenet, 18 from Vox, 55 fro AS 2024-08-12 14:04:33,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.546e+01 2.810e+01 3.287e+01 1.654e+02, threshold=5.620e+01, percent-clipped=2.0 2024-08-12 14:04:36,852 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 14:04:40,253 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 14:04:41,724 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 14:05:29,256 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 14:05:31,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1671510.0, ans=0.125 2024-08-12 14:05:36,111 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7750, loss[loss=0.1126, beats_loss=0.01032, ecapa_loss=0.0001857, whisper_loss=0.1004, over 22848.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01108, ecapa_loss=0.0001748, whisper_loss=0.09135, over 3862619.66 frames. ], batch size: 91, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:05:38,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-12 14:06:19,098 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-12 14:06:31,853 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 14:06:40,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1671910.0, ans=0.0 2024-08-12 14:07:02,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7800, loss[loss=0.1073, beats_loss=0.01046, ecapa_loss=0.000136, whisper_loss=0.09547, over 15176.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.011, ecapa_loss=0.0001753, whisper_loss=0.09112, over 3857736.05 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:07:09,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1672110.0, ans=0.125 2024-08-12 14:07:19,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1672210.0, ans=0.125 2024-08-12 14:07:23,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.562e+01 2.777e+01 3.107e+01 5.363e+01, threshold=5.555e+01, percent-clipped=0.0 2024-08-12 14:07:45,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1672310.0, ans=0.2 2024-08-12 14:08:01,356 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 14:08:28,798 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7850, loss[loss=0.1052, beats_loss=0.01185, ecapa_loss=0.000136, whisper_loss=0.092, over 20567.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001742, whisper_loss=0.09184, over 3848579.65 frames. ], batch size: 78, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:08:38,639 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 14:08:57,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.85 vs. limit=15.0 2024-08-12 14:08:59,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1672710.0, ans=0.125 2024-08-12 14:09:13,994 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 14:09:27,272 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 14:09:27,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1672910.0, ans=0.125 2024-08-12 14:09:29,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1672910.0, ans=0.0 2024-08-12 14:09:44,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1673010.0, ans=0.125 2024-08-12 14:09:47,460 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 14:09:48,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.89 vs. limit=22.5 2024-08-12 14:09:49,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1673010.0, ans=0.125 2024-08-12 14:09:59,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7900, loss[loss=0.1186, beats_loss=0.008972, ecapa_loss=0.0001784, whisper_loss=0.1078, over 15405.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01106, ecapa_loss=0.0001736, whisper_loss=0.09148, over 3845141.10 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:10:11,512 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 14:10:13,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1673210.0, ans=0.125 2024-08-12 14:10:13,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1673210.0, ans=0.0 2024-08-12 14:10:17,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.710e+01 2.918e+01 3.314e+01 4.550e+01, threshold=5.837e+01, percent-clipped=0.0 2024-08-12 14:10:44,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-12 14:10:51,499 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-12 14:11:12,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1673510.0, ans=0.0 2024-08-12 14:11:18,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 7950, loss[loss=0.1142, beats_loss=0.01005, ecapa_loss=0.0001514, whisper_loss=0.1026, over 18882.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01106, ecapa_loss=0.0001743, whisper_loss=0.09155, over 3852439.09 frames. ], batch size: 71, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:11:47,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-12 14:12:04,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1673810.0, ans=0.125 2024-08-12 14:12:08,802 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 14:12:28,603 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 14:12:35,577 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 14:12:47,956 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8000, loss[loss=0.1136, beats_loss=0.01075, ecapa_loss=0.0001715, whisper_loss=0.1011, over 21781.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01104, ecapa_loss=0.0001749, whisper_loss=0.09176, over 3878686.53 frames. ], batch size: 88, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:12:50,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1674110.0, ans=0.0 2024-08-12 14:13:03,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1674210.0, ans=0.125 2024-08-12 14:13:07,787 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.597e+01 2.930e+01 3.466e+01 8.592e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 14:13:08,024 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 14:13:20,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.59 vs. limit=10.0 2024-08-12 14:13:24,974 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 14:13:29,303 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 10 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 14:13:37,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1674410.0, ans=0.1 2024-08-12 14:13:38,821 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 14:13:40,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1674410.0, ans=0.1 2024-08-12 14:13:42,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1674410.0, ans=0.125 2024-08-12 14:14:11,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1674510.0, ans=0.125 2024-08-12 14:14:16,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8050, loss[loss=0.09262, beats_loss=0.01102, ecapa_loss=0.0002083, whisper_loss=0.07952, over 18934.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001752, whisper_loss=0.0919, over 3873969.44 frames. ], batch size: 82, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:14:28,760 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 14:14:40,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1674710.0, ans=0.1 2024-08-12 14:15:09,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1674810.0, ans=0.125 2024-08-12 14:15:22,256 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 14:15:45,331 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 14:15:51,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8100, loss[loss=0.1196, beats_loss=0.01056, ecapa_loss=0.0001441, whisper_loss=0.1076, over 23879.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01106, ecapa_loss=0.0001735, whisper_loss=0.09193, over 3924489.55 frames. ], batch size: 93, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:16:07,711 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 14:16:12,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.342e+01 2.574e+01 2.867e+01 4.166e+01, threshold=5.148e+01, percent-clipped=0.0 2024-08-12 14:16:37,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1675310.0, ans=0.125 2024-08-12 14:17:02,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1675510.0, ans=0.1 2024-08-12 14:17:15,516 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.880e+02 2024-08-12 14:17:18,089 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8150, loss[loss=0.1208, beats_loss=0.009483, ecapa_loss=0.0001802, whisper_loss=0.1095, over 23289.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01094, ecapa_loss=0.0001742, whisper_loss=0.09268, over 3952227.51 frames. ], batch size: 91, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:17:31,335 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 14:17:46,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1675710.0, ans=0.125 2024-08-12 14:18:05,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-08-12 14:18:20,112 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 14:18:21,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1675910.0, ans=0.125 2024-08-12 14:18:46,484 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 14:18:49,425 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 14:18:50,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8200, loss[loss=0.1217, beats_loss=0.008562, ecapa_loss=0.0001868, whisper_loss=0.1112, over 23201.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0109, ecapa_loss=0.0001754, whisper_loss=0.0928, over 3951896.73 frames. ], batch size: 87, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:18:53,301 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 14:19:11,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1676210.0, ans=0.125 2024-08-12 14:19:12,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.595e+01 2.929e+01 3.219e+01 5.675e+01, threshold=5.858e+01, percent-clipped=2.0 2024-08-12 14:19:46,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1676410.0, ans=0.125 2024-08-12 14:19:53,362 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 14:19:55,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1676410.0, ans=0.0 2024-08-12 14:20:00,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1676510.0, ans=0.2 2024-08-12 14:20:14,417 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-12 14:20:15,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8250, loss[loss=0.1148, beats_loss=0.009946, ecapa_loss=0.0002522, whisper_loss=0.1023, over 19703.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01089, ecapa_loss=0.0001768, whisper_loss=0.09298, over 3961642.76 frames. ], batch size: 86, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:20:16,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1676610.0, ans=0.95 2024-08-12 14:20:22,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1676610.0, ans=0.125 2024-08-12 14:20:33,154 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 16 from Vox, 54 fro AS 2024-08-12 14:20:39,474 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 14:20:47,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1676710.0, ans=0.125 2024-08-12 14:20:54,987 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 14:21:01,315 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 14:21:05,100 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 14:21:28,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1677010.0, ans=0.125 2024-08-12 14:21:30,417 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 22 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-12 14:21:42,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1677010.0, ans=0.0 2024-08-12 14:21:43,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-12 14:21:46,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8300, loss[loss=0.08346, beats_loss=0.0124, ecapa_loss=0.0001846, whisper_loss=0.06921, over 13734.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001748, whisper_loss=0.09177, over 3942293.18 frames. ], batch size: 55, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:21:50,153 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 14:21:56,435 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 14:21:58,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1677110.0, ans=0.125 2024-08-12 14:21:59,724 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 14:22:06,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.461e+01 2.729e+01 3.210e+01 2.355e+02, threshold=5.459e+01, percent-clipped=3.0 2024-08-12 14:22:20,412 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 14:22:29,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2024-08-12 14:22:45,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-08-12 14:22:58,114 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 14:22:58,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1677510.0, ans=0.125 2024-08-12 14:23:01,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1677510.0, ans=0.125 2024-08-12 14:23:12,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8350, loss[loss=0.09453, beats_loss=0.01044, ecapa_loss=0.0002152, whisper_loss=0.08193, over 16796.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001749, whisper_loss=0.09151, over 3933499.79 frames. ], batch size: 71, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:23:15,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-12 14:23:24,851 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 14:23:27,055 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 14:23:56,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1677810.0, ans=0.0 2024-08-12 14:23:59,498 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 14:24:21,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1678010.0, ans=0.1 2024-08-12 14:24:25,244 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 14:24:38,423 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8400, loss[loss=0.1008, beats_loss=0.01235, ecapa_loss=0.0001638, whisper_loss=0.08681, over 22026.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001754, whisper_loss=0.09138, over 3906181.45 frames. ], batch size: 88, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:24:45,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1678110.0, ans=0.0 2024-08-12 14:24:54,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1678210.0, ans=0.125 2024-08-12 14:24:59,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.506e+01 2.766e+01 3.211e+01 4.644e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:25:11,371 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 14:25:27,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1678310.0, ans=0.125 2024-08-12 14:25:28,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1678410.0, ans=0.0 2024-08-12 14:25:43,329 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 14:25:52,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1678510.0, ans=0.125 2024-08-12 14:26:02,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8450, loss[loss=0.1033, beats_loss=0.0115, ecapa_loss=0.000151, whisper_loss=0.09032, over 21226.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001743, whisper_loss=0.09172, over 3895141.10 frames. ], batch size: 83, lr: 5.24e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:26:04,774 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 14:26:15,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1678610.0, ans=0.125 2024-08-12 14:26:16,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1678610.0, ans=0.1 2024-08-12 14:26:30,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.82 vs. limit=15.0 2024-08-12 14:26:33,558 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-12 14:26:42,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1678810.0, ans=0.125 2024-08-12 14:26:42,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1678810.0, ans=0.125 2024-08-12 14:26:45,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1678810.0, ans=0.0 2024-08-12 14:26:57,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1678910.0, ans=0.125 2024-08-12 14:27:00,367 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 14:27:09,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-12 14:27:11,855 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 14:27:17,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1679010.0, ans=0.0 2024-08-12 14:27:24,379 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8500, loss[loss=0.1174, beats_loss=0.01136, ecapa_loss=0.0001731, whisper_loss=0.1043, over 22349.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001739, whisper_loss=0.09182, over 3916545.20 frames. ], batch size: 90, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:27:44,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.527e+01 2.828e+01 3.185e+01 5.995e+01, threshold=5.655e+01, percent-clipped=1.0 2024-08-12 14:27:52,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1679210.0, ans=0.1 2024-08-12 14:28:34,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1679510.0, ans=0.0 2024-08-12 14:28:34,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1679510.0, ans=0.125 2024-08-12 14:28:34,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1679510.0, ans=0.0 2024-08-12 14:28:41,356 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 14:28:47,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1679510.0, ans=0.125 2024-08-12 14:28:55,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8550, loss[loss=0.1343, beats_loss=0.009814, ecapa_loss=0.0001631, whisper_loss=0.1229, over 23268.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01089, ecapa_loss=0.0001736, whisper_loss=0.09282, over 3909376.22 frames. ], batch size: 88, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:29:01,100 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-12 14:29:32,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-08-12 14:29:35,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1679810.0, ans=0.0 2024-08-12 14:29:42,788 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 14:30:00,836 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 14:30:08,111 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-168000.pt 2024-08-12 14:30:24,836 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 14:30:25,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2024-08-12 14:30:32,402 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8600, loss[loss=0.1173, beats_loss=0.0125, ecapa_loss=0.0001456, whisper_loss=0.1033, over 21257.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.000175, whisper_loss=0.09251, over 3904514.94 frames. ], batch size: 82, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:30:43,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1680110.0, ans=0.2 2024-08-12 14:30:52,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1680210.0, ans=0.1 2024-08-12 14:30:55,534 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.576e+01 2.836e+01 3.188e+01 4.951e+01, threshold=5.672e+01, percent-clipped=0.0 2024-08-12 14:30:59,379 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 29 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 14:31:26,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-08-12 14:31:27,406 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 14:31:44,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-08-12 14:31:54,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8650, loss[loss=0.1023, beats_loss=0.01329, ecapa_loss=0.0001877, whisper_loss=0.08712, over 21392.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.0001755, whisper_loss=0.09211, over 3890637.39 frames. ], batch size: 90, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:32:22,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1680810.0, ans=0.035 2024-08-12 14:32:49,842 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 14:32:50,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1680910.0, ans=0.125 2024-08-12 14:32:51,091 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 14:32:51,427 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.901e+05 2024-08-12 14:32:54,537 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 14:33:07,974 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8700, loss[loss=0.1087, beats_loss=0.007959, ecapa_loss=0.0002217, whisper_loss=0.09853, over 19544.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01096, ecapa_loss=0.0001757, whisper_loss=0.09223, over 3869204.38 frames. ], batch size: 79, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:33:11,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1681110.0, ans=0.125 2024-08-12 14:33:13,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-12 14:33:18,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1681110.0, ans=0.5 2024-08-12 14:33:22,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1681210.0, ans=0.1 2024-08-12 14:33:25,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.512e+01 2.777e+01 3.126e+01 4.363e+01, threshold=5.553e+01, percent-clipped=0.0 2024-08-12 14:33:33,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1681210.0, ans=0.0 2024-08-12 14:33:35,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-12 14:33:38,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1681310.0, ans=0.0 2024-08-12 14:33:39,059 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 14:33:56,756 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 16 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 14:33:58,155 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 14:33:59,535 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 14:34:11,608 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.286e-02 2024-08-12 14:34:21,455 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8750, loss[loss=0.09356, beats_loss=0.01136, ecapa_loss=0.0001834, whisper_loss=0.08036, over 17121.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001768, whisper_loss=0.09189, over 3818652.89 frames. ], batch size: 69, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:34:34,130 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 14:34:51,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2024-08-12 14:34:58,986 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 14:35:07,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=12.0 2024-08-12 14:35:17,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-08-12 14:35:21,050 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 14:35:33,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8800, loss[loss=0.1061, beats_loss=0.01244, ecapa_loss=0.0001844, whisper_loss=0.09185, over 22951.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01086, ecapa_loss=0.0001767, whisper_loss=0.09267, over 3854586.43 frames. ], batch size: 92, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:35:44,233 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 14:35:44,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1682110.0, ans=0.125 2024-08-12 14:35:45,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1682110.0, ans=0.0 2024-08-12 14:35:53,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.570e+01 2.828e+01 3.387e+01 1.190e+02, threshold=5.656e+01, percent-clipped=1.0 2024-08-12 14:35:59,592 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 14:36:02,763 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 14:36:17,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-08-12 14:36:26,366 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 20 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 14:36:41,457 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:36:45,187 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.096e+00 2024-08-12 14:36:56,286 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8850, loss[loss=0.1073, beats_loss=0.01122, ecapa_loss=0.0001657, whisper_loss=0.09442, over 21562.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01088, ecapa_loss=0.0001765, whisper_loss=0.09277, over 3857330.11 frames. ], batch size: 87, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:37:06,907 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 14:37:07,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-12 14:37:10,774 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 14:37:14,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1682710.0, ans=0.125 2024-08-12 14:37:20,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1682710.0, ans=0.125 2024-08-12 14:37:36,346 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:37:39,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1682810.0, ans=10.0 2024-08-12 14:37:41,422 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 14:37:56,871 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 14:38:01,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1683010.0, ans=0.1 2024-08-12 14:38:14,931 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-12 14:38:16,494 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8900, loss[loss=0.08973, beats_loss=0.008916, ecapa_loss=0.0002001, whisper_loss=0.07881, over 17485.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0109, ecapa_loss=0.0001763, whisper_loss=0.0925, over 3880057.59 frames. ], batch size: 73, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:38:18,027 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 14:38:19,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1683110.0, ans=0.0 2024-08-12 14:38:37,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.454e+01 2.719e+01 3.172e+01 4.928e+01, threshold=5.438e+01, percent-clipped=0.0 2024-08-12 14:38:54,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1683310.0, ans=0.1 2024-08-12 14:39:01,179 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-12 14:39:04,123 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 11 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 14:39:06,989 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 14:39:33,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1683510.0, ans=0.0 2024-08-12 14:39:38,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 8950, loss[loss=0.09045, beats_loss=0.01217, ecapa_loss=0.0002043, whisper_loss=0.07623, over 17502.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01089, ecapa_loss=0.0001765, whisper_loss=0.0923, over 3863219.21 frames. ], batch size: 76, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:39:44,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1683610.0, ans=0.125 2024-08-12 14:40:12,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1683810.0, ans=0.125 2024-08-12 14:40:17,008 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-12 14:40:21,843 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-12 14:40:38,086 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-12 14:40:59,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9000, loss[loss=0.1209, beats_loss=0.009071, ecapa_loss=0.0001834, whisper_loss=0.11, over 22660.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01101, ecapa_loss=0.0001761, whisper_loss=0.09124, over 3875381.00 frames. ], batch size: 92, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:40:59,038 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 14:41:08,251 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7206, 2.8812, 2.0953, 3.2029], device='cuda:0') 2024-08-12 14:41:38,375 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.000585, whisper_loss=0.2487, over 922467.00 frames. 2024-08-12 14:41:57,508 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on SV_voxceleb1: loss=0.004785, beats_loss=0, ecapa_loss=0.0004785, whisper_loss=0, over 939242.00 frames. 2024-08-12 14:43:13,836 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.7189, 2.2000, 2.0499, 1.9273], device='cuda:0') 2024-08-12 14:43:56,581 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on AT_audioset: loss=0.02422, beats_loss=0.02422, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 14:43:56,586 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 14:43:58,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1684110.0, ans=0.125 2024-08-12 14:44:05,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1684110.0, ans=0.1 2024-08-12 14:44:07,772 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 14:44:07,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1684110.0, ans=0.125 2024-08-12 14:44:15,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.474e+01 2.766e+01 3.028e+01 3.985e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 14:44:18,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1684210.0, ans=0.1 2024-08-12 14:44:32,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1684310.0, ans=0.2 2024-08-12 14:44:44,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1684410.0, ans=0.125 2024-08-12 14:44:47,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1684410.0, ans=0.125 2024-08-12 14:44:47,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1684410.0, ans=0.0 2024-08-12 14:44:51,904 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 14:44:57,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-12 14:44:59,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1684510.0, ans=0.2 2024-08-12 14:45:04,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1684510.0, ans=0.125 2024-08-12 14:45:15,206 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9050, loss[loss=0.08048, beats_loss=0.01283, ecapa_loss=0.0001212, whisper_loss=0.06644, over 15863.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01098, ecapa_loss=0.0001754, whisper_loss=0.09172, over 3904536.44 frames. ], batch size: 61, lr: 5.23e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:45:32,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1684710.0, ans=0.125 2024-08-12 14:45:35,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1684710.0, ans=0.2 2024-08-12 14:45:42,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2024-08-12 14:45:45,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1684710.0, ans=0.125 2024-08-12 14:45:47,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.86 vs. limit=10.0 2024-08-12 14:45:51,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1684810.0, ans=0.1 2024-08-12 14:46:09,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1684910.0, ans=0.125 2024-08-12 14:46:10,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1684910.0, ans=0.125 2024-08-12 14:46:23,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1685010.0, ans=0.0 2024-08-12 14:46:35,118 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9100, loss[loss=0.0915, beats_loss=0.0133, ecapa_loss=0.0002097, whisper_loss=0.07611, over 17863.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01092, ecapa_loss=0.0001764, whisper_loss=0.09222, over 3893827.23 frames. ], batch size: 75, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:46:39,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1685110.0, ans=0.125 2024-08-12 14:46:42,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1685110.0, ans=0.1 2024-08-12 14:46:47,577 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 14:46:52,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+01 2.564e+01 2.836e+01 3.271e+01 5.149e+01, threshold=5.673e+01, percent-clipped=0.0 2024-08-12 14:46:54,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1685210.0, ans=0.0 2024-08-12 14:47:01,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2024-08-12 14:47:03,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-12 14:47:06,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1685310.0, ans=0.0 2024-08-12 14:47:06,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1685310.0, ans=0.07 2024-08-12 14:47:15,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1685310.0, ans=0.125 2024-08-12 14:47:15,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1685310.0, ans=0.125 2024-08-12 14:47:25,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=15.0 2024-08-12 14:47:31,470 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 14:47:36,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1685510.0, ans=0.07 2024-08-12 14:47:44,077 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 6 from Vox, 33 fro AS 2024-08-12 14:47:51,613 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9150, loss[loss=0.09224, beats_loss=0.01066, ecapa_loss=0.000234, whisper_loss=0.07924, over 22631.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001767, whisper_loss=0.09189, over 3877447.03 frames. ], batch size: 98, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:48:07,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1685710.0, ans=0.125 2024-08-12 14:48:08,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-12 14:48:14,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1685710.0, ans=0.125 2024-08-12 14:48:56,369 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 14:49:06,694 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9200, loss[loss=0.1112, beats_loss=0.009176, ecapa_loss=0.0001995, whisper_loss=0.09998, over 22368.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.0001765, whisper_loss=0.09258, over 3898984.44 frames. ], batch size: 93, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:49:07,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1686110.0, ans=0.0 2024-08-12 14:49:09,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1686110.0, ans=0.1 2024-08-12 14:49:19,446 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 14:49:19,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-12 14:49:23,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.540e+01 2.969e+01 3.284e+01 5.041e+01, threshold=5.938e+01, percent-clipped=0.0 2024-08-12 14:49:47,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1686310.0, ans=0.0 2024-08-12 14:50:08,676 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 14:50:24,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9250, loss[loss=0.1072, beats_loss=0.005695, ecapa_loss=0.0001801, whisper_loss=0.09966, over 17167.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01105, ecapa_loss=0.0001758, whisper_loss=0.09151, over 3912070.26 frames. ], batch size: 63, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:50:35,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1686610.0, ans=0.125 2024-08-12 14:50:40,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1686710.0, ans=0.0 2024-08-12 14:50:58,186 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 14:51:04,379 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 14:51:10,863 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 14:51:12,983 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-12 14:51:14,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1686810.0, ans=0.0 2024-08-12 14:51:16,169 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 14:51:16,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1686810.0, ans=0.125 2024-08-12 14:51:25,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1686910.0, ans=0.125 2024-08-12 14:51:32,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1687010.0, ans=0.2 2024-08-12 14:51:36,794 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 14:51:49,234 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9300, loss[loss=0.0883, beats_loss=0.01132, ecapa_loss=0.0001509, whisper_loss=0.07547, over 17400.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001752, whisper_loss=0.09183, over 3913365.33 frames. ], batch size: 69, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:51:59,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1687110.0, ans=0.1 2024-08-12 14:52:09,972 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.511e+01 2.773e+01 3.215e+01 9.080e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-12 14:52:10,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1687210.0, ans=0.0 2024-08-12 14:52:39,357 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 14:52:41,002 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-12 14:52:44,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-12 14:52:51,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1687410.0, ans=0.0 2024-08-12 14:52:52,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1687410.0, ans=0.125 2024-08-12 14:52:56,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-12 14:53:01,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-12 14:53:06,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1687510.0, ans=0.0 2024-08-12 14:53:14,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9350, loss[loss=0.08681, beats_loss=0.01214, ecapa_loss=0.0001579, whisper_loss=0.07309, over 22372.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001767, whisper_loss=0.09154, over 3878042.25 frames. ], batch size: 91, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:53:36,855 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 14:53:39,077 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 14:53:49,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1687710.0, ans=0.1 2024-08-12 14:54:29,759 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 14:54:31,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=22.5 2024-08-12 14:54:52,109 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9400, loss[loss=0.09707, beats_loss=0.0101, ecapa_loss=0.0002055, whisper_loss=0.08492, over 21315.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01101, ecapa_loss=0.0001757, whisper_loss=0.0908, over 3857029.74 frames. ], batch size: 87, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:55:05,886 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 14:55:14,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1688210.0, ans=0.04949747468305833 2024-08-12 14:55:18,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.357e+01 2.577e+01 2.940e+01 4.355e+01, threshold=5.154e+01, percent-clipped=0.0 2024-08-12 14:56:16,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2024-08-12 14:56:29,032 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9450, loss[loss=0.1114, beats_loss=0.0081, ecapa_loss=0.0002046, whisper_loss=0.1013, over 14806.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01102, ecapa_loss=0.0001762, whisper_loss=0.09099, over 3868908.43 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:57:11,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1688810.0, ans=0.125 2024-08-12 14:57:26,178 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09947884827852249, model_norm_threshold=51.535552978515625 2024-08-12 14:57:26,346 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.99, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.656e+05, grad_sumsq=2.952e+04, orig_rms_sq=8.999e+00 2024-08-12 14:57:31,983 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-12 14:57:45,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1689010.0, ans=0.125 2024-08-12 14:58:02,640 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9500, loss[loss=0.09364, beats_loss=0.01152, ecapa_loss=0.0001644, whisper_loss=0.08048, over 17223.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01094, ecapa_loss=0.0001753, whisper_loss=0.09248, over 3877535.04 frames. ], batch size: 69, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:58:06,298 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-12 14:58:07,928 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-12 14:58:25,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.024e+01 2.537e+01 2.807e+01 3.213e+01 5.181e+02, threshold=5.615e+01, percent-clipped=1.0 2024-08-12 14:59:21,406 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 14:59:26,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9550, loss[loss=0.09614, beats_loss=0.009753, ecapa_loss=0.0001553, whisper_loss=0.08483, over 14657.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001756, whisper_loss=0.09195, over 3866802.65 frames. ], batch size: 55, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 14:59:31,377 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 14:59:48,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1689710.0, ans=0.0 2024-08-12 14:59:56,213 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 15:00:00,275 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 15:00:14,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-08-12 15:00:17,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1689910.0, ans=0.125 2024-08-12 15:00:22,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1690010.0, ans=0.125 2024-08-12 15:00:33,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1690010.0, ans=0.125 2024-08-12 15:00:34,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1690110.0, ans=0.125 2024-08-12 15:00:35,946 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9600, loss[loss=0.108, beats_loss=0.01232, ecapa_loss=0.0001616, whisper_loss=0.09402, over 17889.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01087, ecapa_loss=0.0001759, whisper_loss=0.09173, over 3832487.79 frames. ], batch size: 72, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:00:37,497 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 15:00:39,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1690110.0, ans=0.125 2024-08-12 15:00:39,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1690110.0, ans=0.09899494936611666 2024-08-12 15:00:42,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.47 vs. limit=15.0 2024-08-12 15:00:49,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1690210.0, ans=0.1 2024-08-12 15:00:53,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.591e+01 2.857e+01 3.252e+01 5.691e+01, threshold=5.714e+01, percent-clipped=2.0 2024-08-12 15:01:18,985 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 15:01:21,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1690410.0, ans=0.125 2024-08-12 15:01:22,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.29 vs. limit=10.0 2024-08-12 15:01:32,333 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:01:34,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1690510.0, ans=0.2 2024-08-12 15:01:36,056 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 15:01:40,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1690510.0, ans=0.0 2024-08-12 15:01:43,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1690610.0, ans=0.125 2024-08-12 15:01:44,261 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9650, loss[loss=0.07075, beats_loss=0.0125, ecapa_loss=0.0001995, whisper_loss=0.05625, over 13978.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001756, whisper_loss=0.09176, over 3827909.28 frames. ], batch size: 60, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:02:08,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1690710.0, ans=0.09899494936611666 2024-08-12 15:02:10,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1690810.0, ans=22.5 2024-08-12 15:02:10,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-08-12 15:02:21,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1690810.0, ans=0.125 2024-08-12 15:02:26,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1690910.0, ans=0.0 2024-08-12 15:02:37,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1691010.0, ans=0.0 2024-08-12 15:02:47,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1691010.0, ans=0.125 2024-08-12 15:02:52,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9700, loss[loss=0.1211, beats_loss=0.01267, ecapa_loss=0.0001681, whisper_loss=0.1068, over 22380.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.0001774, whisper_loss=0.09172, over 3856899.38 frames. ], batch size: 88, lr: 5.22e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:02:53,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1691110.0, ans=0.125 2024-08-12 15:03:07,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1691210.0, ans=0.125 2024-08-12 15:03:08,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-12 15:03:10,784 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.535e+01 2.821e+01 3.429e+01 6.519e+01, threshold=5.641e+01, percent-clipped=1.0 2024-08-12 15:03:25,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=1691310.0, ans=15.0 2024-08-12 15:03:38,526 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 15:03:48,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2024-08-12 15:04:04,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9750, loss[loss=0.1317, beats_loss=0.008187, ecapa_loss=0.0002286, whisper_loss=0.1212, over 23045.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.000178, whisper_loss=0.09162, over 3859534.47 frames. ], batch size: 91, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:04:08,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1691610.0, ans=0.0 2024-08-12 15:04:12,721 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 15:04:19,553 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 22 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-12 15:04:38,394 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-12 15:04:49,518 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 15:04:52,398 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 15:04:59,165 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 15:05:01,652 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 34 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 15:05:07,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1692010.0, ans=0.2 2024-08-12 15:05:12,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9800, loss[loss=0.09727, beats_loss=0.01004, ecapa_loss=0.0001806, whisper_loss=0.08543, over 15577.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01089, ecapa_loss=0.0001775, whisper_loss=0.09279, over 3867675.75 frames. ], batch size: 62, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:05:18,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1692110.0, ans=0.125 2024-08-12 15:05:27,716 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-12 15:05:30,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.561e+01 2.818e+01 3.285e+01 1.389e+02, threshold=5.636e+01, percent-clipped=4.0 2024-08-12 15:05:38,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-12 15:06:07,050 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-12 15:06:09,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1692510.0, ans=0.125 2024-08-12 15:06:19,291 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9850, loss[loss=0.09397, beats_loss=0.01205, ecapa_loss=0.0002017, whisper_loss=0.0799, over 22103.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01093, ecapa_loss=0.0001772, whisper_loss=0.09263, over 3893350.10 frames. ], batch size: 92, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:06:20,830 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 15:06:26,585 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 15:06:40,351 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 15:07:23,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1693010.0, ans=0.07 2024-08-12 15:07:25,802 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-12 15:07:28,330 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9900, loss[loss=0.06451, beats_loss=0.0147, ecapa_loss=0.0001196, whisper_loss=0.04861, over 15653.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001764, whisper_loss=0.09226, over 3872701.65 frames. ], batch size: 61, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:07:28,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1693110.0, ans=0.1 2024-08-12 15:07:33,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1693110.0, ans=0.125 2024-08-12 15:07:44,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1693210.0, ans=0.0 2024-08-12 15:07:46,543 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.533e+01 2.789e+01 3.190e+01 6.872e+01, threshold=5.578e+01, percent-clipped=1.0 2024-08-12 15:08:05,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1693310.0, ans=0.5 2024-08-12 15:08:14,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1693410.0, ans=0.125 2024-08-12 15:08:14,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1693410.0, ans=0.125 2024-08-12 15:08:28,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1693510.0, ans=0.125 2024-08-12 15:08:38,421 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 9950, loss[loss=0.1103, beats_loss=0.01066, ecapa_loss=0.0001925, whisper_loss=0.09769, over 22994.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01099, ecapa_loss=0.0001763, whisper_loss=0.0923, over 3857356.28 frames. ], batch size: 94, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:08:41,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-08-12 15:08:45,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1693610.0, ans=0.2 2024-08-12 15:09:05,342 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-12 15:09:05,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=1693710.0, ans=12.0 2024-08-12 15:09:12,975 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 15:09:13,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1693810.0, ans=0.1 2024-08-12 15:09:24,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1693910.0, ans=0.125 2024-08-12 15:09:34,041 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 15:09:38,175 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 15:09:41,545 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:09:47,787 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-12 15:09:54,014 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10000, loss[loss=0.1143, beats_loss=0.01081, ecapa_loss=0.0001709, whisper_loss=0.1018, over 22776.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01097, ecapa_loss=0.0001763, whisper_loss=0.09286, over 3877033.87 frames. ], batch size: 91, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:10:11,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.072e+01 2.583e+01 2.831e+01 3.339e+01 3.966e+02, threshold=5.663e+01, percent-clipped=2.0 2024-08-12 15:10:13,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1694210.0, ans=0.0 2024-08-12 15:10:18,710 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-12 15:10:21,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1694310.0, ans=0.125 2024-08-12 15:10:24,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1694310.0, ans=0.025 2024-08-12 15:10:29,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1694310.0, ans=0.2 2024-08-12 15:10:58,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-08-12 15:11:01,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10050, loss[loss=0.118, beats_loss=0.01076, ecapa_loss=0.0001449, whisper_loss=0.1058, over 21461.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01098, ecapa_loss=0.0001753, whisper_loss=0.09305, over 3899187.13 frames. ], batch size: 80, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:11:11,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1694610.0, ans=0.05 2024-08-12 15:11:15,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-12 15:11:15,489 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 15:11:19,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1694710.0, ans=0.0 2024-08-12 15:11:25,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1694710.0, ans=0.125 2024-08-12 15:11:53,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1694910.0, ans=0.125 2024-08-12 15:11:57,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.69 vs. limit=5.0 2024-08-12 15:12:01,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1695010.0, ans=0.1 2024-08-12 15:12:14,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10100, loss[loss=0.1036, beats_loss=0.01285, ecapa_loss=0.0001862, whisper_loss=0.08885, over 21369.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.0001757, whisper_loss=0.09224, over 3883129.19 frames. ], batch size: 90, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:12:18,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1695110.0, ans=0.125 2024-08-12 15:12:28,265 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 15:12:33,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.463e+01 2.716e+01 3.042e+01 6.161e+01, threshold=5.433e+01, percent-clipped=3.0 2024-08-12 15:12:50,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=15.0 2024-08-12 15:12:54,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1695310.0, ans=0.0 2024-08-12 15:13:11,520 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 15:13:17,927 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 15:13:24,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2024-08-12 15:13:28,880 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10150, loss[loss=0.09498, beats_loss=0.01233, ecapa_loss=0.0001339, whisper_loss=0.08132, over 14696.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01103, ecapa_loss=0.0001765, whisper_loss=0.09232, over 3906397.48 frames. ], batch size: 57, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:13:37,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1695610.0, ans=0.1 2024-08-12 15:13:41,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1695710.0, ans=0.125 2024-08-12 15:13:44,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1695710.0, ans=0.125 2024-08-12 15:13:49,664 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 15:13:50,854 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 15:14:03,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1695810.0, ans=0.125 2024-08-12 15:14:26,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=15.0 2024-08-12 15:14:36,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10200, loss[loss=0.1081, beats_loss=0.01267, ecapa_loss=0.0001656, whisper_loss=0.09379, over 23769.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01101, ecapa_loss=0.0001772, whisper_loss=0.0922, over 3908975.14 frames. ], batch size: 93, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:14:38,032 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-12 15:14:39,403 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 15:14:54,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.514e+01 2.832e+01 3.281e+01 6.809e+01, threshold=5.664e+01, percent-clipped=1.0 2024-08-12 15:15:04,390 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 15:15:13,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1696310.0, ans=0.1 2024-08-12 15:15:13,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1696310.0, ans=0.125 2024-08-12 15:15:15,702 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 15:15:32,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1696510.0, ans=0.0 2024-08-12 15:15:46,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10250, loss[loss=0.1279, beats_loss=0.009821, ecapa_loss=0.0001459, whisper_loss=0.1167, over 17955.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01093, ecapa_loss=0.0001769, whisper_loss=0.09241, over 3899326.78 frames. ], batch size: 67, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:15:49,290 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 15:15:53,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1696610.0, ans=0.0 2024-08-12 15:15:55,779 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 15:16:04,096 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-12 15:16:05,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1696710.0, ans=0.0 2024-08-12 15:16:20,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1696810.0, ans=0.025 2024-08-12 15:16:33,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1696910.0, ans=0.07 2024-08-12 15:16:44,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1697010.0, ans=0.125 2024-08-12 15:16:57,061 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10300, loss[loss=0.1069, beats_loss=0.01092, ecapa_loss=0.0001616, whisper_loss=0.0944, over 17169.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01097, ecapa_loss=0.0001752, whisper_loss=0.09233, over 3891880.38 frames. ], batch size: 68, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:17:05,072 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 15:17:08,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1697110.0, ans=0.125 2024-08-12 15:17:15,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1697210.0, ans=0.125 2024-08-12 15:17:16,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.570e+01 2.801e+01 3.230e+01 4.716e+01, threshold=5.603e+01, percent-clipped=0.0 2024-08-12 15:17:28,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1697310.0, ans=0.125 2024-08-12 15:17:32,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1697310.0, ans=0.125 2024-08-12 15:17:33,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-12 15:17:34,303 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 15:17:37,722 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.532e-01 2024-08-12 15:17:40,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1697410.0, ans=0.125 2024-08-12 15:18:05,881 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 15:18:09,642 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10350, loss[loss=0.09619, beats_loss=0.01215, ecapa_loss=0.0001765, whisper_loss=0.08228, over 21025.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01098, ecapa_loss=0.0001754, whisper_loss=0.09217, over 3892248.44 frames. ], batch size: 85, lr: 5.21e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:18:28,882 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 15:18:37,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-12 15:18:44,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1697810.0, ans=0.125 2024-08-12 15:18:51,600 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 15:18:57,119 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 15:19:11,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2024-08-12 15:19:12,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-12 15:19:14,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1698010.0, ans=0.0 2024-08-12 15:19:17,175 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10400, loss[loss=0.0973, beats_loss=0.01345, ecapa_loss=0.0001579, whisper_loss=0.08227, over 13494.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01093, ecapa_loss=0.0001751, whisper_loss=0.09249, over 3881762.55 frames. ], batch size: 53, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:19:17,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1698110.0, ans=0.125 2024-08-12 15:19:23,194 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 15:19:29,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1698110.0, ans=0.125 2024-08-12 15:19:35,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.431e+01 2.766e+01 3.090e+01 4.882e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 15:19:54,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-08-12 15:20:19,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1698510.0, ans=0.0 2024-08-12 15:20:24,509 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10450, loss[loss=0.08742, beats_loss=0.01048, ecapa_loss=0.0001886, whisper_loss=0.07506, over 20650.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0109, ecapa_loss=0.0001765, whisper_loss=0.09249, over 3878433.81 frames. ], batch size: 84, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:20:30,131 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 15:20:44,271 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 15:20:45,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-08-12 15:21:02,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1698810.0, ans=0.125 2024-08-12 15:21:04,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1698910.0, ans=0.125 2024-08-12 15:21:23,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1699010.0, ans=0.125 2024-08-12 15:21:32,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10500, loss[loss=0.1101, beats_loss=0.01147, ecapa_loss=0.0001952, whisper_loss=0.09673, over 22489.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.0001775, whisper_loss=0.09226, over 3886253.31 frames. ], batch size: 93, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:21:40,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1699110.0, ans=0.125 2024-08-12 15:21:50,597 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.054e+01 2.539e+01 2.734e+01 3.108e+01 4.878e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-12 15:21:53,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1699210.0, ans=0.0 2024-08-12 15:22:04,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2024-08-12 15:22:13,958 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 15:22:19,473 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 15:22:20,895 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:22:23,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1699410.0, ans=0.5 2024-08-12 15:22:29,839 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 15:22:31,014 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 15:22:34,092 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.611e+00 2024-08-12 15:22:40,598 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10550, loss[loss=0.1195, beats_loss=0.01007, ecapa_loss=0.0001748, whisper_loss=0.1077, over 20689.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001769, whisper_loss=0.09225, over 3892991.43 frames. ], batch size: 81, lr: 5.20e-03, grad_scale: 1.152921504606847e+18 2024-08-12 15:22:49,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1699610.0, ans=0.1 2024-08-12 15:23:03,318 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-12 15:23:03,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1699710.0, ans=0.0 2024-08-12 15:23:05,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1699710.0, ans=0.07 2024-08-12 15:23:19,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1699810.0, ans=0.125 2024-08-12 15:23:21,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-08-12 15:23:27,124 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 15:23:29,626 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 15:23:35,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1699910.0, ans=0.1 2024-08-12 15:23:38,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2024-08-12 15:23:51,996 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 15:23:52,959 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10600, loss[loss=0.09842, beats_loss=0.01182, ecapa_loss=0.0001327, whisper_loss=0.08527, over 15408.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01093, ecapa_loss=0.0001772, whisper_loss=0.09207, over 3873411.86 frames. ], batch size: 62, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:24:12,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1700210.0, ans=0.125 2024-08-12 15:24:13,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.487e+01 2.727e+01 3.054e+01 5.238e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 15:24:24,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1700310.0, ans=0.015 2024-08-12 15:24:40,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1700410.0, ans=0.0 2024-08-12 15:24:55,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1700510.0, ans=0.0 2024-08-12 15:24:58,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-08-12 15:25:04,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1700510.0, ans=0.125 2024-08-12 15:25:07,452 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10650, loss[loss=0.07788, beats_loss=0.01104, ecapa_loss=0.0001671, whisper_loss=0.06517, over 16598.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01093, ecapa_loss=0.0001765, whisper_loss=0.09239, over 3879383.50 frames. ], batch size: 65, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:25:08,188 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:25:13,445 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 15:25:26,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1700710.0, ans=0.125 2024-08-12 15:25:26,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2024-08-12 15:25:28,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2024-08-12 15:25:31,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1700710.0, ans=0.0 2024-08-12 15:25:38,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1700810.0, ans=0.1 2024-08-12 15:25:53,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1700910.0, ans=0.125 2024-08-12 15:25:59,124 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-12 15:26:02,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1700910.0, ans=0.125 2024-08-12 15:26:02,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1700910.0, ans=0.125 2024-08-12 15:26:10,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1701010.0, ans=0.125 2024-08-12 15:26:20,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10700, loss[loss=0.1032, beats_loss=0.009542, ecapa_loss=0.0001456, whisper_loss=0.09223, over 15732.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001753, whisper_loss=0.09189, over 3879436.16 frames. ], batch size: 61, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:26:39,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.524e+01 2.760e+01 3.145e+01 5.039e+01, threshold=5.520e+01, percent-clipped=0.0 2024-08-12 15:26:40,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1701210.0, ans=0.125 2024-08-12 15:26:45,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1701210.0, ans=0.0 2024-08-12 15:27:00,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1701410.0, ans=0.125 2024-08-12 15:27:02,777 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 15:27:16,545 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:27:21,491 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 15:27:27,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10750, loss[loss=0.1235, beats_loss=0.008728, ecapa_loss=0.000184, whisper_loss=0.1129, over 20840.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01095, ecapa_loss=0.0001759, whisper_loss=0.09241, over 3900991.36 frames. ], batch size: 80, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:27:42,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1701710.0, ans=0.2 2024-08-12 15:27:53,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1701810.0, ans=0.125 2024-08-12 15:27:56,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1701810.0, ans=15.0 2024-08-12 15:28:04,456 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 15:28:11,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1701910.0, ans=0.1 2024-08-12 15:28:24,613 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 15:28:27,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-08-12 15:28:28,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1702010.0, ans=0.1 2024-08-12 15:28:35,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10800, loss[loss=0.09933, beats_loss=0.01023, ecapa_loss=0.000162, whisper_loss=0.08748, over 16538.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01102, ecapa_loss=0.000175, whisper_loss=0.09224, over 3877878.14 frames. ], batch size: 64, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:28:40,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-08-12 15:28:46,721 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 15:28:48,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1702210.0, ans=0.125 2024-08-12 15:28:53,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=12.0 2024-08-12 15:28:54,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.536e+01 2.905e+01 3.267e+01 1.637e+02, threshold=5.810e+01, percent-clipped=2.0 2024-08-12 15:28:58,512 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 19 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-12 15:28:59,834 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-12 15:29:03,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1702310.0, ans=0.125 2024-08-12 15:29:05,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1702310.0, ans=0.025 2024-08-12 15:29:42,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10850, loss[loss=0.1106, beats_loss=0.009218, ecapa_loss=0.0001768, whisper_loss=0.09966, over 19763.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01104, ecapa_loss=0.0001759, whisper_loss=0.09281, over 3896553.72 frames. ], batch size: 77, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:29:53,274 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 15:30:03,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1702710.0, ans=0.125 2024-08-12 15:30:19,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.89 vs. limit=10.0 2024-08-12 15:30:23,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-12 15:30:35,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1703010.0, ans=0.04949747468305833 2024-08-12 15:30:42,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1703010.0, ans=0.05 2024-08-12 15:30:43,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1703010.0, ans=0.125 2024-08-12 15:30:47,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1703010.0, ans=0.0 2024-08-12 15:30:47,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1703010.0, ans=0.0 2024-08-12 15:30:50,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10900, loss[loss=0.1259, beats_loss=0.009726, ecapa_loss=0.0001906, whisper_loss=0.1143, over 14524.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01093, ecapa_loss=0.0001755, whisper_loss=0.09328, over 3877345.50 frames. ], batch size: 58, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:30:59,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-08-12 15:31:04,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1703210.0, ans=0.0 2024-08-12 15:31:08,861 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.493e+01 2.855e+01 3.171e+01 4.648e+01, threshold=5.710e+01, percent-clipped=0.0 2024-08-12 15:31:09,097 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 15:31:24,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1703310.0, ans=0.125 2024-08-12 15:31:36,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1703410.0, ans=0.125 2024-08-12 15:31:43,235 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-12 15:31:46,730 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 15:31:52,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1703510.0, ans=0.1 2024-08-12 15:31:57,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 10950, loss[loss=0.1149, beats_loss=0.01099, ecapa_loss=0.0001632, whisper_loss=0.1023, over 23591.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01094, ecapa_loss=0.0001755, whisper_loss=0.09309, over 3891589.65 frames. ], batch size: 94, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:31:58,665 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-12 15:32:15,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1703710.0, ans=0.125 2024-08-12 15:32:16,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-12 15:32:23,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1703710.0, ans=0.125 2024-08-12 15:32:33,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1703810.0, ans=0.125 2024-08-12 15:33:05,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1704010.0, ans=0.125 2024-08-12 15:33:13,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11000, loss[loss=0.08941, beats_loss=0.01395, ecapa_loss=0.0002062, whisper_loss=0.07339, over 22046.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01093, ecapa_loss=0.0001764, whisper_loss=0.09337, over 3931437.72 frames. ], batch size: 94, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:33:18,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-12 15:33:20,477 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 15:33:23,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1704110.0, ans=0.125 2024-08-12 15:33:32,672 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.453e+01 2.776e+01 3.261e+01 5.617e+01, threshold=5.552e+01, percent-clipped=0.0 2024-08-12 15:33:36,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1704210.0, ans=0.0 2024-08-12 15:33:38,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1704210.0, ans=0.0 2024-08-12 15:33:41,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1704310.0, ans=0.2 2024-08-12 15:33:46,457 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 27 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 15:33:50,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1704310.0, ans=0.2 2024-08-12 15:34:00,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=22.5 2024-08-12 15:34:06,525 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 15:34:16,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1704510.0, ans=0.05 2024-08-12 15:34:21,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11050, loss[loss=0.1128, beats_loss=0.01048, ecapa_loss=0.0001706, whisper_loss=0.1006, over 20903.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.011, ecapa_loss=0.0001768, whisper_loss=0.09243, over 3917999.44 frames. ], batch size: 82, lr: 5.20e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:34:28,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1704610.0, ans=0.035 2024-08-12 15:34:42,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-12 15:34:53,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1704810.0, ans=0.1 2024-08-12 15:34:54,959 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 15:34:59,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1704810.0, ans=0.125 2024-08-12 15:35:29,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11100, loss[loss=0.1003, beats_loss=0.01092, ecapa_loss=0.0001849, whisper_loss=0.08755, over 19819.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01086, ecapa_loss=0.0001769, whisper_loss=0.09303, over 3882795.15 frames. ], batch size: 83, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:35:32,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1705110.0, ans=0.1 2024-08-12 15:35:45,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1705210.0, ans=0.125 2024-08-12 15:35:48,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.398e+01 2.655e+01 3.117e+01 6.342e+01, threshold=5.309e+01, percent-clipped=1.0 2024-08-12 15:36:06,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1705310.0, ans=0.125 2024-08-12 15:36:10,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1705410.0, ans=0.125 2024-08-12 15:36:24,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2024-08-12 15:36:34,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1705510.0, ans=0.125 2024-08-12 15:36:35,870 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 15:36:38,740 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11150, loss[loss=0.1092, beats_loss=0.008879, ecapa_loss=0.0001629, whisper_loss=0.09871, over 17370.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01088, ecapa_loss=0.0001755, whisper_loss=0.093, over 3894222.42 frames. ], batch size: 65, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:36:51,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-12 15:37:18,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1705910.0, ans=0.125 2024-08-12 15:37:26,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-12 15:37:32,698 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 15:37:46,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11200, loss[loss=0.1122, beats_loss=0.009825, ecapa_loss=0.0001561, whisper_loss=0.1008, over 19139.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01079, ecapa_loss=0.000175, whisper_loss=0.09361, over 3900207.68 frames. ], batch size: 73, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:37:49,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1706110.0, ans=0.125 2024-08-12 15:38:04,380 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 15:38:05,489 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.490e+01 2.836e+01 3.047e+01 5.086e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 15:38:28,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-12 15:38:40,250 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 15:38:40,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1706510.0, ans=0.125 2024-08-12 15:38:53,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11250, loss[loss=0.1188, beats_loss=0.01031, ecapa_loss=0.0001745, whisper_loss=0.1067, over 22907.00 frames. ], tot_loss[loss=0.1062, beats_loss=0.0108, ecapa_loss=0.0001754, whisper_loss=0.09366, over 3918330.96 frames. ], batch size: 93, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:39:02,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-08-12 15:39:36,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1706910.0, ans=0.125 2024-08-12 15:40:00,436 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-12 15:40:01,554 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11300, loss[loss=0.1161, beats_loss=0.008998, ecapa_loss=0.0001704, whisper_loss=0.1054, over 17192.00 frames. ], tot_loss[loss=0.1065, beats_loss=0.01078, ecapa_loss=0.0001744, whisper_loss=0.09394, over 3915334.61 frames. ], batch size: 66, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:40:06,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2024-08-12 15:40:12,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-12 15:40:20,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+01 2.567e+01 2.768e+01 3.157e+01 8.223e+01, threshold=5.536e+01, percent-clipped=2.0 2024-08-12 15:40:21,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1707210.0, ans=0.2 2024-08-12 15:40:30,674 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 15:40:32,040 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 15:40:32,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-12 15:40:50,849 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 15:40:56,501 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.525e+01 2024-08-12 15:41:00,130 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 15:41:10,189 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11350, loss[loss=0.09939, beats_loss=0.01051, ecapa_loss=0.0001618, whisper_loss=0.08726, over 17670.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01086, ecapa_loss=0.0001739, whisper_loss=0.09323, over 3903769.47 frames. ], batch size: 67, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:41:16,750 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 15:41:18,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.40 vs. limit=22.5 2024-08-12 15:41:24,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1707710.0, ans=0.125 2024-08-12 15:41:42,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2024-08-12 15:41:56,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1707910.0, ans=0.0 2024-08-12 15:42:05,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1708010.0, ans=0.125 2024-08-12 15:42:07,933 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 15:42:17,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11400, loss[loss=0.1077, beats_loss=0.01063, ecapa_loss=0.0001823, whisper_loss=0.09523, over 18915.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01082, ecapa_loss=0.0001745, whisper_loss=0.09351, over 3907709.51 frames. ], batch size: 77, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:42:30,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-12 15:42:32,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1708210.0, ans=0.0 2024-08-12 15:42:36,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.714e+01 3.019e+01 3.288e+01 4.590e+01, threshold=6.038e+01, percent-clipped=0.0 2024-08-12 15:42:37,005 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-12 15:42:38,355 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 15:43:00,517 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 15:43:08,609 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 15:43:16,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1708510.0, ans=0.035 2024-08-12 15:43:25,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1708610.0, ans=0.125 2024-08-12 15:43:25,900 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11450, loss[loss=0.1118, beats_loss=0.0113, ecapa_loss=0.0001593, whisper_loss=0.09891, over 22827.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01087, ecapa_loss=0.0001742, whisper_loss=0.0932, over 3877383.00 frames. ], batch size: 90, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:43:31,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1708610.0, ans=0.125 2024-08-12 15:43:35,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1708610.0, ans=0.125 2024-08-12 15:44:11,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1708910.0, ans=0.125 2024-08-12 15:44:23,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-12 15:44:27,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2024-08-12 15:44:31,379 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 15:44:34,164 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11500, loss[loss=0.122, beats_loss=0.01002, ecapa_loss=0.0001781, whisper_loss=0.1102, over 17625.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01083, ecapa_loss=0.0001739, whisper_loss=0.09353, over 3860370.57 frames. ], batch size: 69, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:44:38,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1709110.0, ans=0.125 2024-08-12 15:44:45,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1709110.0, ans=0.0 2024-08-12 15:44:51,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-12 15:44:54,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.425e+01 2.764e+01 3.070e+01 5.781e+01, threshold=5.529e+01, percent-clipped=0.0 2024-08-12 15:44:55,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1709210.0, ans=0.125 2024-08-12 15:45:03,678 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 15:45:16,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2024-08-12 15:45:19,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1709410.0, ans=0.04949747468305833 2024-08-12 15:45:24,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1709410.0, ans=0.125 2024-08-12 15:45:24,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1709410.0, ans=0.0 2024-08-12 15:45:46,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-12 15:45:47,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11550, loss[loss=0.1174, beats_loss=0.01129, ecapa_loss=0.0001575, whisper_loss=0.1045, over 22264.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01088, ecapa_loss=0.0001746, whisper_loss=0.09331, over 3853334.36 frames. ], batch size: 89, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:46:01,171 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-12 15:46:06,586 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-12 15:46:26,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1709910.0, ans=0.125 2024-08-12 15:46:31,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1709910.0, ans=0.125 2024-08-12 15:46:36,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1709910.0, ans=0.1 2024-08-12 15:46:57,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1710010.0, ans=0.125 2024-08-12 15:47:00,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1710010.0, ans=0.125 2024-08-12 15:47:04,050 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11600, loss[loss=0.1184, beats_loss=0.01031, ecapa_loss=0.0001629, whisper_loss=0.1065, over 14900.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01089, ecapa_loss=0.0001736, whisper_loss=0.09269, over 3852248.39 frames. ], batch size: 57, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:47:04,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2024-08-12 15:47:12,057 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 15:47:15,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1710110.0, ans=0.0 2024-08-12 15:47:19,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1710110.0, ans=0.1 2024-08-12 15:47:28,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=12.0 2024-08-12 15:47:32,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.125e+01 2.592e+01 2.931e+01 3.257e+01 5.066e+01, threshold=5.862e+01, percent-clipped=0.0 2024-08-12 15:47:41,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1710210.0, ans=0.125 2024-08-12 15:47:53,070 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 15:48:24,490 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-12 15:48:24,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1710410.0, ans=6.0 2024-08-12 15:48:30,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1710510.0, ans=0.125 2024-08-12 15:48:34,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.68 vs. limit=5.0 2024-08-12 15:48:45,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-08-12 15:48:47,775 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 15:48:51,756 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11650, loss[loss=0.1189, beats_loss=0.008423, ecapa_loss=0.0002074, whisper_loss=0.1084, over 20392.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01087, ecapa_loss=0.0001747, whisper_loss=0.09285, over 3847803.16 frames. ], batch size: 83, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:49:00,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1710610.0, ans=6.0 2024-08-12 15:49:02,271 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 24 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-12 15:49:21,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1710710.0, ans=0.0 2024-08-12 15:49:41,050 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 15:49:51,947 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 15:49:56,392 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 15:50:02,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1710810.0, ans=0.0 2024-08-12 15:50:15,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1710910.0, ans=0.1 2024-08-12 15:50:34,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1710910.0, ans=0.125 2024-08-12 15:50:50,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1711010.0, ans=0.2 2024-08-12 15:50:50,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1711010.0, ans=0.125 2024-08-12 15:50:50,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1711010.0, ans=0.125 2024-08-12 15:51:06,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11700, loss[loss=0.1367, beats_loss=0.007424, ecapa_loss=0.0002216, whisper_loss=0.1271, over 20179.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.011, ecapa_loss=0.0001736, whisper_loss=0.09288, over 3883423.41 frames. ], batch size: 80, lr: 5.19e-03, grad_scale: 5.764607523034235e+17 2024-08-12 15:51:12,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1711110.0, ans=0.0 2024-08-12 15:51:40,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2024-08-12 15:51:45,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.679e+01 3.031e+01 3.384e+01 8.068e+01, threshold=6.063e+01, percent-clipped=1.0 2024-08-12 15:51:45,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1711210.0, ans=0.0 2024-08-12 15:51:59,716 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 15:52:29,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2024-08-12 15:52:33,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-12 15:52:38,267 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 15:52:38,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1711410.0, ans=0.1 2024-08-12 15:53:08,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1711510.0, ans=0.125 2024-08-12 15:53:20,121 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11750, loss[loss=0.1036, beats_loss=0.01122, ecapa_loss=0.0001899, whisper_loss=0.09052, over 21456.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01104, ecapa_loss=0.0001741, whisper_loss=0.09283, over 3900692.59 frames. ], batch size: 88, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:53:51,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1711710.0, ans=0.125 2024-08-12 15:54:11,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1711810.0, ans=0.07 2024-08-12 15:54:22,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1711810.0, ans=0.1 2024-08-12 15:54:31,273 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 15:54:42,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-12 15:54:49,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1712010.0, ans=0.2 2024-08-12 15:55:02,307 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11800, loss[loss=0.1129, beats_loss=0.01077, ecapa_loss=0.0001574, whisper_loss=0.1006, over 23563.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01105, ecapa_loss=0.0001746, whisper_loss=0.09256, over 3935626.03 frames. ], batch size: 91, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:55:07,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1712110.0, ans=0.0 2024-08-12 15:55:11,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1712110.0, ans=0.125 2024-08-12 15:55:11,309 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.011e-02 2024-08-12 15:55:30,175 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.421e+01 2.823e+01 3.255e+01 8.063e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 15:55:34,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-12 15:55:56,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1712410.0, ans=0.125 2024-08-12 15:56:05,935 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 15:56:25,248 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 15:56:31,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11850, loss[loss=0.1008, beats_loss=0.008013, ecapa_loss=0.0001871, whisper_loss=0.09094, over 16745.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.011, ecapa_loss=0.0001741, whisper_loss=0.09243, over 3922009.74 frames. ], batch size: 65, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:56:38,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1712610.0, ans=10.0 2024-08-12 15:56:40,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1712610.0, ans=0.0 2024-08-12 15:57:06,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1712810.0, ans=0.125 2024-08-12 15:57:16,268 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 15:57:46,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1713010.0, ans=0.125 2024-08-12 15:57:46,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1713010.0, ans=0.1 2024-08-12 15:57:50,781 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.762e+01 2024-08-12 15:57:52,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1713010.0, ans=0.2 2024-08-12 15:57:57,458 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 15:57:58,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11900, loss[loss=0.09003, beats_loss=0.01327, ecapa_loss=0.0001454, whisper_loss=0.07531, over 20846.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01101, ecapa_loss=0.0001745, whisper_loss=0.09247, over 3940328.55 frames. ], batch size: 85, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:58:01,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1713110.0, ans=0.0 2024-08-12 15:58:03,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1713110.0, ans=0.05 2024-08-12 15:58:08,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1713110.0, ans=0.025 2024-08-12 15:58:13,567 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 15:58:13,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1713110.0, ans=0.2 2024-08-12 15:58:24,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.471e+01 2.746e+01 3.069e+01 1.141e+02, threshold=5.492e+01, percent-clipped=1.0 2024-08-12 15:58:25,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-12 15:58:41,980 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 15:58:52,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-08-12 15:59:07,236 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 15:59:17,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1713510.0, ans=0.0 2024-08-12 15:59:24,026 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 11950, loss[loss=0.1082, beats_loss=0.00994, ecapa_loss=0.0001817, whisper_loss=0.09649, over 16223.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01092, ecapa_loss=0.0001759, whisper_loss=0.09272, over 3910402.96 frames. ], batch size: 63, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 15:59:31,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1713610.0, ans=0.125 2024-08-12 15:59:37,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-12 15:59:44,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1713710.0, ans=0.07 2024-08-12 15:59:49,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1713710.0, ans=0.2 2024-08-12 16:00:04,756 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-12 16:00:04,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1713810.0, ans=0.125 2024-08-12 16:00:13,880 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:00:14,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.04 vs. limit=10.0 2024-08-12 16:00:26,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-08-12 16:00:27,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1713910.0, ans=0.125 2024-08-12 16:00:35,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1714010.0, ans=0.1 2024-08-12 16:00:37,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1714010.0, ans=0.0 2024-08-12 16:00:50,423 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12000, loss[loss=0.1068, beats_loss=0.01249, ecapa_loss=0.0001557, whisper_loss=0.09273, over 22955.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01088, ecapa_loss=0.0001754, whisper_loss=0.09306, over 3906405.47 frames. ], batch size: 92, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:00:50,424 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 16:01:32,485 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005955, whisper_loss=0.2482, over 922467.00 frames. 2024-08-12 16:01:52,022 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on SV_voxceleb1: loss=0.004759, beats_loss=0, ecapa_loss=0.0004759, whisper_loss=0, over 939242.00 frames. 2024-08-12 16:02:14,890 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2019, 1.7390, 1.4868, 1.1450, 1.3135, 1.3341, 1.6679, 1.4414], device='cuda:0') 2024-08-12 16:03:43,545 INFO [train_multi_KD3.py:1149] (0/4) Epoch 12, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 16:03:43,549 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 16:03:45,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1714110.0, ans=0.125 2024-08-12 16:03:47,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-12 16:03:53,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1714110.0, ans=0.125 2024-08-12 16:03:53,140 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.878e+05 2024-08-12 16:04:06,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.028e+01 2.437e+01 2.734e+01 3.186e+01 7.564e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 16:04:10,133 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-12 16:04:11,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1714210.0, ans=0.2 2024-08-12 16:04:16,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1714310.0, ans=0.2 2024-08-12 16:04:26,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-12 16:04:31,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-12 16:04:35,761 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 16:04:46,292 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 16:04:50,598 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 16:04:51,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-12 16:04:53,363 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-12 16:04:55,024 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 16:04:59,159 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12050, loss[loss=0.0956, beats_loss=0.01219, ecapa_loss=0.000177, whisper_loss=0.08163, over 21719.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01095, ecapa_loss=0.0001756, whisper_loss=0.09189, over 3847695.11 frames. ], batch size: 89, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:05:06,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1714610.0, ans=0.125 2024-08-12 16:05:15,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1714710.0, ans=0.0 2024-08-12 16:05:54,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1714910.0, ans=0.0 2024-08-12 16:06:11,057 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 22 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 16:06:14,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1715110.0, ans=0.0 2024-08-12 16:06:15,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12100, loss[loss=0.1226, beats_loss=0.01046, ecapa_loss=0.0002172, whisper_loss=0.1099, over 22724.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001764, whisper_loss=0.09221, over 3857550.95 frames. ], batch size: 94, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:06:19,010 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 16:06:37,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1715210.0, ans=0.0 2024-08-12 16:06:37,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1715210.0, ans=0.125 2024-08-12 16:06:38,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.371e+01 2.653e+01 2.949e+01 4.098e+01, threshold=5.305e+01, percent-clipped=0.0 2024-08-12 16:06:38,502 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 16:06:43,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1715210.0, ans=0.0 2024-08-12 16:07:06,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1715410.0, ans=0.125 2024-08-12 16:07:22,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2024-08-12 16:07:27,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1715510.0, ans=0.125 2024-08-12 16:07:33,552 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 16:07:36,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12150, loss[loss=0.08457, beats_loss=0.01096, ecapa_loss=0.0001678, whisper_loss=0.07193, over 17544.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.0001766, whisper_loss=0.09204, over 3835327.31 frames. ], batch size: 68, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:07:46,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1715610.0, ans=0.125 2024-08-12 16:07:51,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1715710.0, ans=0.0 2024-08-12 16:08:12,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1715810.0, ans=0.125 2024-08-12 16:08:13,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1715810.0, ans=0.0 2024-08-12 16:08:40,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1716010.0, ans=0.125 2024-08-12 16:08:40,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1716010.0, ans=0.0 2024-08-12 16:08:48,144 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 16:08:51,793 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12200, loss[loss=0.09385, beats_loss=0.01194, ecapa_loss=0.0001458, whisper_loss=0.08045, over 16748.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.0001758, whisper_loss=0.09269, over 3862272.25 frames. ], batch size: 67, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:09:13,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.462e+01 2.887e+01 3.237e+01 1.771e+02, threshold=5.773e+01, percent-clipped=2.0 2024-08-12 16:09:20,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1716310.0, ans=0.125 2024-08-12 16:09:23,397 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 16:09:26,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1716310.0, ans=0.125 2024-08-12 16:09:40,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1716410.0, ans=0.0 2024-08-12 16:09:40,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.14 vs. limit=10.0 2024-08-12 16:09:55,532 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 12 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 16:10:01,542 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-12 16:10:07,301 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12250, loss[loss=0.09331, beats_loss=0.01132, ecapa_loss=0.0001524, whisper_loss=0.08047, over 20626.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01079, ecapa_loss=0.000177, whisper_loss=0.09296, over 3847317.82 frames. ], batch size: 80, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:10:29,751 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 16:10:29,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1716710.0, ans=0.1 2024-08-12 16:10:31,160 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-12 16:10:32,873 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 16:10:40,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-12 16:10:42,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1716810.0, ans=0.125 2024-08-12 16:10:43,906 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 16:11:01,672 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 16:11:02,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=22.5 2024-08-12 16:11:09,543 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 16:11:27,648 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12300, loss[loss=0.1147, beats_loss=0.008722, ecapa_loss=0.0001831, whisper_loss=0.1042, over 21073.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.0108, ecapa_loss=0.000178, whisper_loss=0.09296, over 3853881.54 frames. ], batch size: 79, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:11:34,166 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 16:11:52,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.615e+01 2.930e+01 3.275e+01 9.862e+01, threshold=5.860e+01, percent-clipped=1.0 2024-08-12 16:12:46,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1717510.0, ans=0.125 2024-08-12 16:12:51,303 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12350, loss[loss=0.1032, beats_loss=0.009647, ecapa_loss=0.0001671, whisper_loss=0.09193, over 17649.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0108, ecapa_loss=0.0001783, whisper_loss=0.093, over 3853139.46 frames. ], batch size: 68, lr: 5.18e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:12:55,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1717610.0, ans=0.0 2024-08-12 16:13:03,389 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 16:13:10,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1717710.0, ans=0.0 2024-08-12 16:13:10,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-12 16:13:14,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1717710.0, ans=0.1 2024-08-12 16:13:16,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1717710.0, ans=0.0 2024-08-12 16:13:33,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1717810.0, ans=0.1 2024-08-12 16:13:42,969 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 16:13:59,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1718010.0, ans=0.0 2024-08-12 16:14:05,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1718010.0, ans=0.125 2024-08-12 16:14:14,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12400, loss[loss=0.09643, beats_loss=0.01309, ecapa_loss=0.0001621, whisper_loss=0.08171, over 22009.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01077, ecapa_loss=0.0001781, whisper_loss=0.09334, over 3861770.72 frames. ], batch size: 90, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:14:16,596 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 16:14:16,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1718110.0, ans=0.2 2024-08-12 16:14:25,578 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 16:14:37,999 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 14 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-12 16:14:40,097 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.106e+01 2.684e+01 3.067e+01 3.396e+01 5.308e+01, threshold=6.133e+01, percent-clipped=1.0 2024-08-12 16:14:40,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1718210.0, ans=0.0 2024-08-12 16:14:54,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1718310.0, ans=0.0 2024-08-12 16:15:07,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1718410.0, ans=0.1 2024-08-12 16:15:14,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1718410.0, ans=0.0 2024-08-12 16:15:18,342 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 16:15:24,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1718510.0, ans=0.2 2024-08-12 16:15:24,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1718510.0, ans=0.2 2024-08-12 16:15:36,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12450, loss[loss=0.1088, beats_loss=0.01381, ecapa_loss=0.0001376, whisper_loss=0.09356, over 16706.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01075, ecapa_loss=0.0001777, whisper_loss=0.09349, over 3881058.30 frames. ], batch size: 67, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:16:12,639 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 16:16:12,931 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.061e+02 2024-08-12 16:16:16,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.37 vs. limit=10.0 2024-08-12 16:16:21,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1718810.0, ans=0.1 2024-08-12 16:16:24,737 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 16:16:56,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12500, loss[loss=0.1042, beats_loss=0.01203, ecapa_loss=0.0001448, whisper_loss=0.09069, over 22075.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.0001763, whisper_loss=0.09261, over 3869476.21 frames. ], batch size: 87, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:17:19,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.385e+01 2.736e+01 3.208e+01 9.127e+01, threshold=5.473e+01, percent-clipped=1.0 2024-08-12 16:17:21,445 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 29 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 16:17:21,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1719210.0, ans=0.125 2024-08-12 16:17:29,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2024-08-12 16:18:02,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1719510.0, ans=0.2 2024-08-12 16:18:04,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1719510.0, ans=0.1 2024-08-12 16:18:14,289 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 16:18:16,542 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12550, loss[loss=0.09324, beats_loss=0.01303, ecapa_loss=0.0001286, whisper_loss=0.07892, over 22350.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01084, ecapa_loss=0.0001757, whisper_loss=0.09282, over 3884397.83 frames. ], batch size: 88, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:19:18,423 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-172000.pt 2024-08-12 16:19:29,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=12.0 2024-08-12 16:19:38,872 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12600, loss[loss=0.1099, beats_loss=0.009989, ecapa_loss=0.0001987, whisper_loss=0.09793, over 23183.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01085, ecapa_loss=0.0001752, whisper_loss=0.09294, over 3881726.03 frames. ], batch size: 92, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:19:43,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-12 16:19:55,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-12 16:19:57,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1720210.0, ans=0.0 2024-08-12 16:20:03,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.583e+01 2.914e+01 3.404e+01 5.799e+01, threshold=5.828e+01, percent-clipped=1.0 2024-08-12 16:20:11,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1720310.0, ans=0.125 2024-08-12 16:20:25,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1720410.0, ans=0.1 2024-08-12 16:20:27,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1720410.0, ans=0.1 2024-08-12 16:20:37,423 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.766e+05 2024-08-12 16:20:57,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1720610.0, ans=0.125 2024-08-12 16:20:58,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12650, loss[loss=0.1165, beats_loss=0.01249, ecapa_loss=0.0001813, whisper_loss=0.1022, over 18501.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01097, ecapa_loss=0.0001748, whisper_loss=0.09278, over 3888933.53 frames. ], batch size: 73, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:21:05,008 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 16:21:08,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1720610.0, ans=0.1 2024-08-12 16:21:14,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1720710.0, ans=0.125 2024-08-12 16:21:24,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1720710.0, ans=0.125 2024-08-12 16:21:39,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-12 16:21:40,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1720810.0, ans=0.1 2024-08-12 16:22:02,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.90 vs. limit=10.0 2024-08-12 16:22:04,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1721010.0, ans=0.125 2024-08-12 16:22:10,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1721010.0, ans=0.95 2024-08-12 16:22:11,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1721010.0, ans=0.0 2024-08-12 16:22:14,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1721110.0, ans=0.2 2024-08-12 16:22:16,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12700, loss[loss=0.1179, beats_loss=0.01064, ecapa_loss=0.0001696, whisper_loss=0.1056, over 22214.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01095, ecapa_loss=0.0001749, whisper_loss=0.09329, over 3874769.02 frames. ], batch size: 86, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:22:37,977 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.725e-02 2024-08-12 16:22:40,114 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.411e+01 2.657e+01 2.975e+01 5.020e+01, threshold=5.313e+01, percent-clipped=0.0 2024-08-12 16:22:40,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1721210.0, ans=0.125 2024-08-12 16:22:56,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1721310.0, ans=0.0 2024-08-12 16:23:05,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1721410.0, ans=0.125 2024-08-12 16:23:05,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1721410.0, ans=0.125 2024-08-12 16:23:08,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1721410.0, ans=0.2 2024-08-12 16:23:18,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1721510.0, ans=0.2 2024-08-12 16:23:35,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12750, loss[loss=0.09794, beats_loss=0.01278, ecapa_loss=0.0001815, whisper_loss=0.08335, over 18695.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0111, ecapa_loss=0.0001757, whisper_loss=0.09222, over 3878764.36 frames. ], batch size: 77, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:23:54,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1721710.0, ans=0.125 2024-08-12 16:24:12,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-12 16:24:28,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1721910.0, ans=0.125 2024-08-12 16:24:32,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=12.0 2024-08-12 16:24:57,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12800, loss[loss=0.1206, beats_loss=0.01043, ecapa_loss=0.0001561, whisper_loss=0.1086, over 23641.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.0111, ecapa_loss=0.0001767, whisper_loss=0.09257, over 3906419.61 frames. ], batch size: 91, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:25:06,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1722110.0, ans=0.035 2024-08-12 16:25:21,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.602e+01 2.886e+01 3.279e+01 7.661e+01, threshold=5.773e+01, percent-clipped=1.0 2024-08-12 16:25:32,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1722310.0, ans=10.0 2024-08-12 16:25:54,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1722410.0, ans=0.0 2024-08-12 16:26:00,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1722510.0, ans=0.125 2024-08-12 16:26:04,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1722510.0, ans=0.125 2024-08-12 16:26:05,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-12 16:26:18,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12850, loss[loss=0.1011, beats_loss=0.009333, ecapa_loss=0.0001623, whisper_loss=0.09016, over 16197.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01112, ecapa_loss=0.0001761, whisper_loss=0.09186, over 3874382.62 frames. ], batch size: 58, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:26:25,705 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 16:26:29,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1722610.0, ans=0.0 2024-08-12 16:26:41,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1722710.0, ans=0.0 2024-08-12 16:26:41,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-08-12 16:26:52,542 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-12 16:27:00,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1722810.0, ans=0.05 2024-08-12 16:27:11,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1722910.0, ans=0.125 2024-08-12 16:27:18,136 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 14 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 16:27:23,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1722910.0, ans=0.0 2024-08-12 16:27:33,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1723010.0, ans=0.0 2024-08-12 16:27:39,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1723110.0, ans=0.125 2024-08-12 16:27:40,668 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12900, loss[loss=0.0969, beats_loss=0.01134, ecapa_loss=0.0002028, whisper_loss=0.08352, over 21358.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01115, ecapa_loss=0.0001773, whisper_loss=0.09122, over 3855982.34 frames. ], batch size: 93, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:27:49,993 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 24 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-12 16:27:53,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1723110.0, ans=0.0 2024-08-12 16:28:02,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1723210.0, ans=0.5 2024-08-12 16:28:05,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.675e+01 2.950e+01 4.604e+01, threshold=5.350e+01, percent-clipped=0.0 2024-08-12 16:28:13,273 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-12 16:28:20,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=10.0 2024-08-12 16:28:42,450 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-12 16:28:53,886 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 16:29:03,578 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 12950, loss[loss=0.1256, beats_loss=0.0079, ecapa_loss=0.0002129, whisper_loss=0.1156, over 20877.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01099, ecapa_loss=0.0001775, whisper_loss=0.09216, over 3847285.35 frames. ], batch size: 80, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:29:03,774 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 16:29:07,050 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-12 16:29:13,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1723610.0, ans=0.0 2024-08-12 16:29:18,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1723710.0, ans=0.0 2024-08-12 16:29:37,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1723810.0, ans=0.1 2024-08-12 16:29:50,038 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 36 from Vox, 22 fro AS 2024-08-12 16:29:58,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1723910.0, ans=0.0 2024-08-12 16:30:28,572 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-12 16:30:30,690 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13000, loss[loss=0.1067, beats_loss=0.01023, ecapa_loss=0.0002064, whisper_loss=0.0944, over 22019.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001769, whisper_loss=0.0923, over 3852856.51 frames. ], batch size: 94, lr: 5.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:30:47,424 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-12 16:30:55,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.537e+01 2.771e+01 3.073e+01 6.149e+01, threshold=5.541e+01, percent-clipped=2.0 2024-08-12 16:31:00,634 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 16:31:11,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1724310.0, ans=0.125 2024-08-12 16:31:14,725 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 16:31:18,779 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 35 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 16:31:54,596 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13050, loss[loss=0.08948, beats_loss=0.01163, ecapa_loss=0.0001997, whisper_loss=0.07585, over 15459.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001769, whisper_loss=0.09183, over 3821193.50 frames. ], batch size: 64, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:31:58,615 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:31:59,856 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 16:32:11,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1724710.0, ans=0.125 2024-08-12 16:32:34,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1724810.0, ans=0.0 2024-08-12 16:32:44,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1724910.0, ans=0.1 2024-08-12 16:32:49,454 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 16:32:53,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1724910.0, ans=0.125 2024-08-12 16:33:06,953 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-12 16:33:17,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13100, loss[loss=0.1211, beats_loss=0.008898, ecapa_loss=0.0001839, whisper_loss=0.1104, over 17457.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01089, ecapa_loss=0.0001759, whisper_loss=0.09218, over 3821541.74 frames. ], batch size: 67, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:33:37,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1725210.0, ans=0.125 2024-08-12 16:33:41,195 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.092e+01 2.633e+01 2.841e+01 3.164e+01 5.259e+01, threshold=5.682e+01, percent-clipped=0.0 2024-08-12 16:33:49,669 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 16:33:49,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1725310.0, ans=0.125 2024-08-12 16:33:56,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1725310.0, ans=0.2 2024-08-12 16:34:07,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.57 vs. limit=15.0 2024-08-12 16:34:10,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1725410.0, ans=0.0 2024-08-12 16:34:16,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1725410.0, ans=0.1 2024-08-12 16:34:16,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-12 16:34:21,744 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 16:34:38,521 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13150, loss[loss=0.08723, beats_loss=0.01342, ecapa_loss=0.0001554, whisper_loss=0.07225, over 15306.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.0001761, whisper_loss=0.09201, over 3831366.23 frames. ], batch size: 60, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:34:39,060 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 16:34:41,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1725610.0, ans=0.0 2024-08-12 16:34:54,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2024-08-12 16:34:55,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1725710.0, ans=0.2 2024-08-12 16:35:05,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1725710.0, ans=0.0 2024-08-12 16:35:52,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=1726010.0, ans=10.0 2024-08-12 16:36:02,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13200, loss[loss=0.1001, beats_loss=0.01201, ecapa_loss=0.0001127, whisper_loss=0.08698, over 17622.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001748, whisper_loss=0.09186, over 3798041.01 frames. ], batch size: 65, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:36:15,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1726110.0, ans=0.2 2024-08-12 16:36:16,487 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 16:36:25,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.556e+01 2.815e+01 3.284e+01 6.256e+01, threshold=5.630e+01, percent-clipped=1.0 2024-08-12 16:36:32,340 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 13 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 16:36:35,050 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 16:36:44,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-08-12 16:37:02,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1726410.0, ans=0.125 2024-08-12 16:37:08,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1726510.0, ans=0.125 2024-08-12 16:37:10,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1726510.0, ans=0.0 2024-08-12 16:37:10,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.92 vs. limit=10.0 2024-08-12 16:37:24,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13250, loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001807, whisper_loss=0.09073, over 19024.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01085, ecapa_loss=0.000176, whisper_loss=0.09229, over 3785719.48 frames. ], batch size: 77, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:37:30,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1726610.0, ans=0.0 2024-08-12 16:37:52,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1726710.0, ans=0.125 2024-08-12 16:37:55,884 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 16:38:31,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1727010.0, ans=0.0 2024-08-12 16:38:37,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2024-08-12 16:38:40,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1727010.0, ans=0.0 2024-08-12 16:38:47,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-12 16:38:49,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13300, loss[loss=0.07625, beats_loss=0.01279, ecapa_loss=0.0001881, whisper_loss=0.06158, over 19268.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01093, ecapa_loss=0.0001755, whisper_loss=0.09178, over 3793285.28 frames. ], batch size: 81, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:38:51,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1727110.0, ans=0.1 2024-08-12 16:38:53,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1727110.0, ans=0.125 2024-08-12 16:39:11,593 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 16 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-12 16:39:12,810 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.550e+01 2.829e+01 3.095e+01 6.127e+01, threshold=5.657e+01, percent-clipped=1.0 2024-08-12 16:39:23,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2024-08-12 16:39:52,101 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 16:39:52,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2024-08-12 16:39:55,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1727510.0, ans=0.0 2024-08-12 16:40:05,309 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 16:40:09,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13350, loss[loss=0.1128, beats_loss=0.01174, ecapa_loss=0.000165, whisper_loss=0.09942, over 23078.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01095, ecapa_loss=0.0001747, whisper_loss=0.09187, over 3800526.53 frames. ], batch size: 91, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:40:47,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1727810.0, ans=0.125 2024-08-12 16:40:51,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-08-12 16:41:04,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1727910.0, ans=0.1 2024-08-12 16:41:07,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1727910.0, ans=0.2 2024-08-12 16:41:22,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1728010.0, ans=0.125 2024-08-12 16:41:30,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1728110.0, ans=0.1 2024-08-12 16:41:31,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13400, loss[loss=0.1003, beats_loss=0.01062, ecapa_loss=0.000132, whisper_loss=0.08833, over 22547.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01096, ecapa_loss=0.0001752, whisper_loss=0.09166, over 3822448.99 frames. ], batch size: 84, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:41:35,728 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 16:41:54,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.756e+01 3.172e+01 3.565e+01 5.325e+01, threshold=6.343e+01, percent-clipped=0.0 2024-08-12 16:42:06,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1728310.0, ans=0.0 2024-08-12 16:42:13,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1728310.0, ans=0.0 2024-08-12 16:42:43,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1728510.0, ans=0.05 2024-08-12 16:42:49,247 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 16:42:50,927 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13450, loss[loss=0.09088, beats_loss=0.0113, ecapa_loss=0.0001534, whisper_loss=0.07805, over 16983.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01101, ecapa_loss=0.0001751, whisper_loss=0.09148, over 3865596.14 frames. ], batch size: 68, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:42:57,520 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 16:43:02,473 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 16:43:18,911 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 16:43:31,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1728810.0, ans=0.0 2024-08-12 16:43:45,575 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 16:43:52,129 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 16:43:52,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1728910.0, ans=0.0 2024-08-12 16:44:15,009 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13500, loss[loss=0.1089, beats_loss=0.008651, ecapa_loss=0.0001494, whisper_loss=0.09874, over 17539.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.0001761, whisper_loss=0.09226, over 3873582.02 frames. ], batch size: 64, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:44:38,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.512e+01 2.797e+01 3.062e+01 5.746e+01, threshold=5.594e+01, percent-clipped=0.0 2024-08-12 16:44:41,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1729210.0, ans=0.125 2024-08-12 16:44:51,461 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 16:44:53,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1729310.0, ans=0.035 2024-08-12 16:45:00,044 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-12 16:45:16,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=15.0 2024-08-12 16:45:29,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1729510.0, ans=0.125 2024-08-12 16:45:29,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1729510.0, ans=0.0 2024-08-12 16:45:30,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1729510.0, ans=0.035 2024-08-12 16:45:30,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1729510.0, ans=0.1 2024-08-12 16:45:38,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13550, loss[loss=0.1095, beats_loss=0.01074, ecapa_loss=0.0001837, whisper_loss=0.09693, over 23302.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01086, ecapa_loss=0.0001774, whisper_loss=0.0927, over 3877767.52 frames. ], batch size: 95, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:45:41,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1729610.0, ans=0.125 2024-08-12 16:45:44,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1729610.0, ans=0.125 2024-08-12 16:45:51,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1729610.0, ans=0.1 2024-08-12 16:45:54,426 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 13 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 16:46:15,210 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 36 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 16:46:16,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2024-08-12 16:46:16,565 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 16:46:16,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1729810.0, ans=0.125 2024-08-12 16:46:25,969 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 16:47:02,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1730010.0, ans=0.1 2024-08-12 16:47:05,958 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13600, loss[loss=0.09583, beats_loss=0.0139, ecapa_loss=0.000169, whisper_loss=0.08024, over 21299.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01095, ecapa_loss=0.0001767, whisper_loss=0.09198, over 3859906.66 frames. ], batch size: 89, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:47:25,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1730210.0, ans=0.0 2024-08-12 16:47:31,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.482e+01 2.733e+01 3.104e+01 2.478e+02, threshold=5.467e+01, percent-clipped=1.0 2024-08-12 16:47:31,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1730210.0, ans=0.125 2024-08-12 16:47:33,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1730210.0, ans=0.125 2024-08-12 16:47:34,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=12.0 2024-08-12 16:47:42,210 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 16:47:47,913 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 16:47:48,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1730310.0, ans=0.1 2024-08-12 16:47:48,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-08-12 16:47:50,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1730310.0, ans=0.125 2024-08-12 16:47:51,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2024-08-12 16:48:12,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1730510.0, ans=0.5 2024-08-12 16:48:21,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1730510.0, ans=0.1 2024-08-12 16:48:31,143 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13650, loss[loss=0.0924, beats_loss=0.01349, ecapa_loss=0.0001096, whisper_loss=0.07782, over 16189.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01096, ecapa_loss=0.0001766, whisper_loss=0.0923, over 3864010.24 frames. ], batch size: 60, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:48:41,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1730610.0, ans=0.2 2024-08-12 16:48:50,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1730710.0, ans=0.1 2024-08-12 16:49:05,415 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 16:49:06,788 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-12 16:49:07,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1730810.0, ans=0.1 2024-08-12 16:49:21,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1730910.0, ans=0.0 2024-08-12 16:50:06,981 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13700, loss[loss=0.09409, beats_loss=0.01018, ecapa_loss=0.0001842, whisper_loss=0.08207, over 21486.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01096, ecapa_loss=0.0001758, whisper_loss=0.09292, over 3881160.52 frames. ], batch size: 89, lr: 5.16e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 16:50:11,752 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 6 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 16:50:11,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1731110.0, ans=0.2 2024-08-12 16:50:17,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1731110.0, ans=0.125 2024-08-12 16:50:29,472 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-12 16:50:34,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.487e+01 2.754e+01 3.214e+01 5.264e+01, threshold=5.508e+01, percent-clipped=0.0 2024-08-12 16:50:39,206 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:50:48,296 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-12 16:51:03,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1731410.0, ans=0.1 2024-08-12 16:51:33,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13750, loss[loss=0.09699, beats_loss=0.01247, ecapa_loss=0.0002069, whisper_loss=0.08245, over 19820.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01089, ecapa_loss=0.0001754, whisper_loss=0.09306, over 3860284.91 frames. ], batch size: 83, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:51:36,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1731610.0, ans=0.1 2024-08-12 16:52:10,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2024-08-12 16:52:17,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1731810.0, ans=0.125 2024-08-12 16:52:32,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1731910.0, ans=0.2 2024-08-12 16:52:36,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1731910.0, ans=0.0 2024-08-12 16:52:59,349 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13800, loss[loss=0.08835, beats_loss=0.009398, ecapa_loss=0.0002099, whisper_loss=0.07686, over 15258.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01087, ecapa_loss=0.000175, whisper_loss=0.09264, over 3822516.91 frames. ], batch size: 63, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:53:01,354 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 16:53:16,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1732210.0, ans=0.125 2024-08-12 16:53:21,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-12 16:53:23,374 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.580e+02 2024-08-12 16:53:24,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.542e+01 2.940e+01 3.312e+01 1.437e+02, threshold=5.879e+01, percent-clipped=2.0 2024-08-12 16:54:26,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1732610.0, ans=0.2 2024-08-12 16:54:28,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13850, loss[loss=0.1355, beats_loss=0.009635, ecapa_loss=0.0001991, whisper_loss=0.1239, over 22284.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.0001739, whisper_loss=0.09192, over 3836548.31 frames. ], batch size: 91, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:54:39,034 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 16:54:54,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1732710.0, ans=0.125 2024-08-12 16:55:04,640 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 16:55:18,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1732910.0, ans=0.2 2024-08-12 16:55:44,088 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 16:55:53,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-12 16:55:59,220 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13900, loss[loss=0.08914, beats_loss=0.01181, ecapa_loss=0.0001661, whisper_loss=0.07566, over 14175.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001727, whisper_loss=0.09182, over 3880068.77 frames. ], batch size: 57, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:56:04,729 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 16:56:05,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-08-12 16:56:18,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1733210.0, ans=0.0 2024-08-12 16:56:23,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1733210.0, ans=0.125 2024-08-12 16:56:25,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.635e+01 2.870e+01 3.246e+01 6.120e+01, threshold=5.740e+01, percent-clipped=1.0 2024-08-12 16:56:28,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1733210.0, ans=0.0 2024-08-12 16:56:46,248 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 16:57:01,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1733410.0, ans=0.05 2024-08-12 16:57:21,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 13950, loss[loss=0.0973, beats_loss=0.01149, ecapa_loss=0.0001669, whisper_loss=0.08414, over 22889.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01101, ecapa_loss=0.0001726, whisper_loss=0.09198, over 3907819.66 frames. ], batch size: 91, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:57:27,305 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 16:57:28,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1733610.0, ans=0.125 2024-08-12 16:57:30,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1733610.0, ans=0.125 2024-08-12 16:57:30,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1733610.0, ans=0.2 2024-08-12 16:57:43,533 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 16:57:45,514 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 16:57:45,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-12 16:58:00,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1733810.0, ans=0.2 2024-08-12 16:58:20,195 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 16:58:29,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1734010.0, ans=0.125 2024-08-12 16:58:31,838 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 16:58:33,009 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 16:58:34,660 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 16:58:41,826 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 16:58:44,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14000, loss[loss=0.09684, beats_loss=0.01131, ecapa_loss=0.0001803, whisper_loss=0.08372, over 16927.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001714, whisper_loss=0.09207, over 3893350.07 frames. ], batch size: 67, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 16:58:45,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1734110.0, ans=0.125 2024-08-12 16:59:01,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1734210.0, ans=0.125 2024-08-12 16:59:02,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.79 vs. limit=22.5 2024-08-12 16:59:09,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.481e+01 2.768e+01 3.199e+01 7.750e+01, threshold=5.536e+01, percent-clipped=1.0 2024-08-12 16:59:09,702 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 39 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-12 16:59:13,154 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 16:59:23,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1734310.0, ans=0.1 2024-08-12 16:59:26,681 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 26 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-12 17:00:00,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1734510.0, ans=0.125 2024-08-12 17:00:14,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14050, loss[loss=0.09948, beats_loss=0.01343, ecapa_loss=0.0001451, whisper_loss=0.08459, over 19760.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001721, whisper_loss=0.0924, over 3883967.97 frames. ], batch size: 80, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:00:32,136 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 17:00:55,242 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-12 17:01:16,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1734910.0, ans=0.5 2024-08-12 17:01:20,862 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 17:01:24,990 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 17:01:38,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-12 17:01:41,459 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14100, loss[loss=0.09891, beats_loss=0.01184, ecapa_loss=0.0001225, whisper_loss=0.08585, over 15749.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01102, ecapa_loss=0.0001709, whisper_loss=0.09203, over 3903384.72 frames. ], batch size: 58, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:01:52,324 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-12 17:02:10,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.069e+01 2.510e+01 2.862e+01 3.257e+01 4.688e+01, threshold=5.723e+01, percent-clipped=0.0 2024-08-12 17:02:16,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=15.0 2024-08-12 17:02:38,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1735410.0, ans=0.125 2024-08-12 17:02:43,453 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 17:02:46,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1735410.0, ans=0.0 2024-08-12 17:02:57,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-12 17:03:10,893 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14150, loss[loss=0.08662, beats_loss=0.009264, ecapa_loss=0.0002118, whisper_loss=0.07523, over 18768.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01114, ecapa_loss=0.0001714, whisper_loss=0.09115, over 3883537.71 frames. ], batch size: 75, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:03:15,474 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 17:03:47,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1735710.0, ans=0.025 2024-08-12 17:04:06,120 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 17:04:16,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1735910.0, ans=0.0 2024-08-12 17:04:21,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1735910.0, ans=0.0 2024-08-12 17:04:32,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1736010.0, ans=0.0 2024-08-12 17:04:50,222 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14200, loss[loss=0.09448, beats_loss=0.01379, ecapa_loss=0.000131, whisper_loss=0.07938, over 23240.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01115, ecapa_loss=0.0001699, whisper_loss=0.09104, over 3888180.94 frames. ], batch size: 92, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:05:05,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1736210.0, ans=0.125 2024-08-12 17:05:14,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+01 2.568e+01 2.822e+01 3.210e+01 8.568e+01, threshold=5.645e+01, percent-clipped=1.0 2024-08-12 17:05:16,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1736210.0, ans=0.0 2024-08-12 17:05:22,461 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 17:05:24,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2024-08-12 17:05:27,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2024-08-12 17:05:30,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1736310.0, ans=0.0 2024-08-12 17:05:34,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-08-12 17:05:38,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2024-08-12 17:05:48,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1736410.0, ans=0.125 2024-08-12 17:06:10,894 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14250, loss[loss=0.1063, beats_loss=0.01219, ecapa_loss=0.0001456, whisper_loss=0.09266, over 17693.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01109, ecapa_loss=0.0001707, whisper_loss=0.09125, over 3877403.99 frames. ], batch size: 70, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:06:11,479 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.552e+00 2024-08-12 17:06:15,982 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 17:06:35,003 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-12 17:06:39,367 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 17:07:00,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1736810.0, ans=0.125 2024-08-12 17:07:08,061 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-12 17:07:35,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1737010.0, ans=0.125 2024-08-12 17:07:44,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14300, loss[loss=0.1089, beats_loss=0.01258, ecapa_loss=0.000175, whisper_loss=0.09453, over 21117.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0111, ecapa_loss=0.0001695, whisper_loss=0.09117, over 3850595.41 frames. ], batch size: 87, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:07:48,516 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 13 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 17:07:56,606 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 17:08:04,469 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 14 from Vox, 49 fro AS 2024-08-12 17:08:10,784 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.074e+01 2.619e+01 2.822e+01 3.259e+01 8.695e+01, threshold=5.643e+01, percent-clipped=1.0 2024-08-12 17:08:12,604 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 17:08:28,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.14 vs. limit=22.5 2024-08-12 17:08:32,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1737310.0, ans=0.025 2024-08-12 17:08:53,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1737510.0, ans=0.0 2024-08-12 17:09:11,213 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14350, loss[loss=0.08411, beats_loss=0.01033, ecapa_loss=0.0002056, whisper_loss=0.07172, over 21224.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001705, whisper_loss=0.09157, over 3858389.09 frames. ], batch size: 92, lr: 5.15e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:09:12,669 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-12 17:09:17,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1737610.0, ans=0.2 2024-08-12 17:09:26,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1737710.0, ans=0.1 2024-08-12 17:09:40,943 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-12 17:09:42,703 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 17:09:53,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1737810.0, ans=0.125 2024-08-12 17:09:57,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2024-08-12 17:10:48,398 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 17:10:56,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1738010.0, ans=0.1 2024-08-12 17:11:13,084 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14400, loss[loss=0.08689, beats_loss=0.009757, ecapa_loss=0.0001553, whisper_loss=0.07558, over 13747.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001734, whisper_loss=0.09161, over 3834334.75 frames. ], batch size: 53, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:11:17,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1738110.0, ans=0.125 2024-08-12 17:11:19,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1738110.0, ans=0.125 2024-08-12 17:11:44,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.468e+01 2.751e+01 3.183e+01 4.709e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-12 17:12:03,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1738310.0, ans=0.2 2024-08-12 17:12:24,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2024-08-12 17:12:46,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1738510.0, ans=0.125 2024-08-12 17:12:50,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1738510.0, ans=0.1 2024-08-12 17:12:52,971 INFO [train_multi_KD3.py:1116] (0/4) Epoch 12, batch 14450, loss[loss=0.1193, beats_loss=0.00938, ecapa_loss=0.0001753, whisper_loss=0.1082, over 22066.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.000174, whisper_loss=0.0913, over 3872647.19 frames. ], batch size: 87, lr: 5.14e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:13:00,085 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-12 17:13:29,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1738810.0, ans=0.125 2024-08-12 17:13:35,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.46 vs. limit=22.5 2024-08-12 17:13:41,683 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 17:13:48,480 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 17:13:57,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1739010.0, ans=0.0 2024-08-12 17:14:09,336 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-12.pt 2024-08-12 17:15:01,343 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 0, loss[loss=0.09213, beats_loss=0.01222, ecapa_loss=0.0001678, whisper_loss=0.07823, over 19754.00 frames. ], tot_loss[loss=0.09213, beats_loss=0.01222, ecapa_loss=0.0001678, whisper_loss=0.07823, over 19754.00 frames. ], batch size: 80, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:15:01,345 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 17:15:45,045 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on ASR_libri: loss=0.255, beats_loss=0, ecapa_loss=0.0005844, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 17:16:01,532 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on SV_voxceleb1: loss=0.004777, beats_loss=0, ecapa_loss=0.0004777, whisper_loss=0, over 939242.00 frames. 2024-08-12 17:16:42,540 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9615, 2.1592, 1.8561, 1.3761, 1.6949, 1.5264, 2.1038, 2.0655], device='cuda:0') 2024-08-12 17:17:58,198 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3140, 3.1077, 3.2198, 3.1302], device='cuda:0') 2024-08-12 17:18:04,492 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on AT_audioset: loss=0.02416, beats_loss=0.02416, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 17:18:04,500 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 17:18:55,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+01 2.525e+01 2.835e+01 3.382e+01 8.605e+01, threshold=5.671e+01, percent-clipped=1.0 2024-08-12 17:18:56,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1739180.0, ans=0.0 2024-08-12 17:18:58,934 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-12 17:19:13,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2024-08-12 17:19:15,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-08-12 17:19:43,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1739380.0, ans=0.0 2024-08-12 17:20:18,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 50, loss[loss=0.101, beats_loss=0.01015, ecapa_loss=0.0001721, whisper_loss=0.08917, over 23848.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01024, ecapa_loss=0.0001787, whisper_loss=0.09039, over 853458.04 frames. ], batch size: 94, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:20:24,858 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 17:20:42,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1739680.0, ans=0.125 2024-08-12 17:20:54,579 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-12 17:21:02,041 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 17:21:44,391 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-12 17:22:14,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1739980.0, ans=0.125 2024-08-12 17:22:20,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 100, loss[loss=0.1077, beats_loss=0.01041, ecapa_loss=0.0001511, whisper_loss=0.09579, over 23681.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01025, ecapa_loss=0.0001778, whisper_loss=0.08957, over 1522597.83 frames. ], batch size: 93, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:22:23,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1740080.0, ans=0.0 2024-08-12 17:22:39,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=22.5 2024-08-12 17:22:41,331 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 17:22:44,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.93 vs. limit=6.0 2024-08-12 17:23:00,094 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 17:23:05,585 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+01 2.865e+01 3.060e+01 3.356e+01 6.213e+01, threshold=6.120e+01, percent-clipped=1.0 2024-08-12 17:23:13,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1740280.0, ans=0.2 2024-08-12 17:23:15,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1740280.0, ans=0.125 2024-08-12 17:23:19,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-12 17:23:31,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1740380.0, ans=0.125 2024-08-12 17:23:36,762 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:23:45,370 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-12 17:23:50,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1740380.0, ans=0.0 2024-08-12 17:24:15,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 150, loss[loss=0.1172, beats_loss=0.01036, ecapa_loss=0.0001738, whisper_loss=0.1051, over 15199.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01024, ecapa_loss=0.0001762, whisper_loss=0.09049, over 2030046.58 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:24:18,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-12 17:24:24,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.94 vs. limit=6.0 2024-08-12 17:24:26,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1740580.0, ans=0.2 2024-08-12 17:24:28,067 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 17:24:55,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2024-08-12 17:24:57,784 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-12 17:25:19,710 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 17:25:29,713 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-12 17:25:37,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1740980.0, ans=0.125 2024-08-12 17:25:42,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 200, loss[loss=0.1081, beats_loss=0.009957, ecapa_loss=0.0002139, whisper_loss=0.09598, over 13348.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01032, ecapa_loss=0.0001752, whisper_loss=0.09123, over 2415177.27 frames. ], batch size: 56, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:25:52,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1741080.0, ans=0.035 2024-08-12 17:26:05,157 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 17:26:06,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1741180.0, ans=0.2 2024-08-12 17:26:11,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.594e+01 3.008e+01 3.381e+01 4.307e+01, threshold=6.015e+01, percent-clipped=0.0 2024-08-12 17:26:22,379 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-12 17:26:53,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1741480.0, ans=0.5 2024-08-12 17:26:56,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-08-12 17:27:00,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 250, loss[loss=0.08601, beats_loss=0.01459, ecapa_loss=0.0001585, whisper_loss=0.06983, over 18502.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01038, ecapa_loss=0.0001758, whisper_loss=0.09166, over 2694711.88 frames. ], batch size: 75, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:27:00,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1741580.0, ans=0.0 2024-08-12 17:27:05,067 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-12 17:27:10,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1741580.0, ans=0.2 2024-08-12 17:27:11,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1741580.0, ans=0.1 2024-08-12 17:27:18,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1741680.0, ans=0.125 2024-08-12 17:27:35,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1741780.0, ans=0.07 2024-08-12 17:27:40,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1741780.0, ans=0.125 2024-08-12 17:27:41,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1741780.0, ans=0.0 2024-08-12 17:27:41,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1741780.0, ans=0.125 2024-08-12 17:27:46,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.59 vs. limit=22.5 2024-08-12 17:28:04,572 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 17:28:13,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1741980.0, ans=0.0 2024-08-12 17:28:17,807 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 300, loss[loss=0.1072, beats_loss=0.01068, ecapa_loss=0.0001494, whisper_loss=0.09503, over 17599.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001759, whisper_loss=0.09099, over 2943001.90 frames. ], batch size: 65, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:28:23,708 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 17:28:44,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.349e+01 2.732e+01 3.113e+01 6.634e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-12 17:28:57,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1742280.0, ans=0.125 2024-08-12 17:29:14,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=1742380.0, ans=0.1 2024-08-12 17:29:20,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-12 17:29:27,015 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 35 from Vox, 34 fro AS 2024-08-12 17:29:32,755 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 350, loss[loss=0.09411, beats_loss=0.009389, ecapa_loss=0.0001484, whisper_loss=0.08324, over 15127.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001756, whisper_loss=0.08975, over 3130978.46 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:29:44,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.689e+05 2024-08-12 17:30:02,077 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-12 17:30:23,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1742880.0, ans=0.125 2024-08-12 17:30:30,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1742980.0, ans=0.125 2024-08-12 17:30:40,800 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-12 17:30:44,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2024-08-12 17:30:44,509 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 400, loss[loss=0.1027, beats_loss=0.01171, ecapa_loss=0.0001664, whisper_loss=0.08937, over 20209.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01064, ecapa_loss=0.0001741, whisper_loss=0.09042, over 3272956.89 frames. ], batch size: 79, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:30:57,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1743080.0, ans=0.1 2024-08-12 17:31:02,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1743180.0, ans=0.125 2024-08-12 17:31:10,787 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.102e+01 2.525e+01 2.765e+01 3.244e+01 1.385e+02, threshold=5.529e+01, percent-clipped=2.0 2024-08-12 17:31:36,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1743380.0, ans=0.125 2024-08-12 17:31:49,427 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-12 17:31:58,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 450, loss[loss=0.1255, beats_loss=0.00601, ecapa_loss=0.0002403, whisper_loss=0.1171, over 15063.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001755, whisper_loss=0.09072, over 3384557.70 frames. ], batch size: 59, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:32:05,749 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 17:32:09,873 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-12 17:32:10,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1743580.0, ans=0.0 2024-08-12 17:32:10,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1743580.0, ans=0.0 2024-08-12 17:32:19,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1743680.0, ans=0.1 2024-08-12 17:32:31,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1743780.0, ans=0.0 2024-08-12 17:32:33,272 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 17:32:41,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-08-12 17:32:55,633 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 17:33:01,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1743980.0, ans=0.1 2024-08-12 17:33:01,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1743980.0, ans=0.2 2024-08-12 17:33:11,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 500, loss[loss=0.1196, beats_loss=0.01009, ecapa_loss=0.0001853, whisper_loss=0.1076, over 23749.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001745, whisper_loss=0.09038, over 3491839.30 frames. ], batch size: 95, lr: 4.94e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:33:15,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1744080.0, ans=0.2 2024-08-12 17:33:18,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2024-08-12 17:33:19,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1744080.0, ans=0.0 2024-08-12 17:33:27,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1744180.0, ans=0.04949747468305833 2024-08-12 17:33:39,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1744180.0, ans=0.125 2024-08-12 17:33:39,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1744180.0, ans=0.0 2024-08-12 17:33:40,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.531e+01 2.780e+01 3.170e+01 4.119e+01, threshold=5.561e+01, percent-clipped=0.0 2024-08-12 17:33:49,219 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 17:33:53,794 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-12 17:34:11,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1744380.0, ans=0.0 2024-08-12 17:34:11,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1744380.0, ans=0.1 2024-08-12 17:34:11,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=15.0 2024-08-12 17:34:19,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1744480.0, ans=0.125 2024-08-12 17:34:30,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 550, loss[loss=0.08889, beats_loss=0.01339, ecapa_loss=0.0001635, whisper_loss=0.07386, over 22073.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0108, ecapa_loss=0.0001724, whisper_loss=0.08983, over 3565081.42 frames. ], batch size: 89, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:34:35,783 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 17:34:39,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-12 17:35:04,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1744780.0, ans=0.0 2024-08-12 17:35:14,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-12 17:35:44,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1745080.0, ans=0.125 2024-08-12 17:35:45,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 600, loss[loss=0.09968, beats_loss=0.01348, ecapa_loss=0.0001404, whisper_loss=0.08479, over 20272.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01088, ecapa_loss=0.0001698, whisper_loss=0.08923, over 3602073.69 frames. ], batch size: 79, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:35:48,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1745080.0, ans=0.1 2024-08-12 17:35:56,786 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-12 17:35:58,676 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 17:36:07,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1745180.0, ans=0.1 2024-08-12 17:36:10,395 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-12 17:36:11,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.517e+01 2.834e+01 3.150e+01 6.498e+01, threshold=5.667e+01, percent-clipped=2.0 2024-08-12 17:36:12,963 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 27 from Vox, 18 fro AS 2024-08-12 17:36:19,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=12.0 2024-08-12 17:36:25,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.62 vs. limit=15.0 2024-08-12 17:36:27,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1745280.0, ans=0.125 2024-08-12 17:36:32,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1745380.0, ans=0.125 2024-08-12 17:36:35,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1745380.0, ans=0.1 2024-08-12 17:36:45,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1745480.0, ans=0.0 2024-08-12 17:36:45,589 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2024-08-12 17:36:47,823 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 17:36:57,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 650, loss[loss=0.08839, beats_loss=0.01407, ecapa_loss=0.0001208, whisper_loss=0.07312, over 20772.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01088, ecapa_loss=0.0001693, whisper_loss=0.08957, over 3655619.57 frames. ], batch size: 82, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:37:07,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1745580.0, ans=0.125 2024-08-12 17:37:40,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1745880.0, ans=0.025 2024-08-12 17:37:44,330 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 17:37:53,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1745880.0, ans=0.0 2024-08-12 17:38:08,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1745980.0, ans=0.1 2024-08-12 17:38:10,846 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 700, loss[loss=0.1037, beats_loss=0.01136, ecapa_loss=0.0001612, whisper_loss=0.09075, over 23131.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01093, ecapa_loss=0.0001697, whisper_loss=0.09058, over 3709925.60 frames. ], batch size: 91, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:38:15,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1746080.0, ans=0.125 2024-08-12 17:38:33,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1746180.0, ans=0.2 2024-08-12 17:38:36,497 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 17:38:37,568 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.446e+01 2.651e+01 3.040e+01 5.006e+01, threshold=5.302e+01, percent-clipped=0.0 2024-08-12 17:38:52,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1746280.0, ans=0.04949747468305833 2024-08-12 17:39:12,716 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 17:39:14,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1746480.0, ans=0.0 2024-08-12 17:39:15,789 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 17:39:22,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2024-08-12 17:39:24,615 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 750, loss[loss=0.1083, beats_loss=0.01099, ecapa_loss=0.0001397, whisper_loss=0.09593, over 25061.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01098, ecapa_loss=0.0001684, whisper_loss=0.09068, over 3738911.46 frames. ], batch size: 93, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:39:28,761 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 17:39:49,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1746680.0, ans=0.1 2024-08-12 17:39:49,992 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 17:39:52,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1746780.0, ans=0.125 2024-08-12 17:39:53,949 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-12 17:40:16,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.47 vs. limit=15.0 2024-08-12 17:40:21,640 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 17:40:24,585 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 17:40:28,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1746980.0, ans=0.2 2024-08-12 17:40:37,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 800, loss[loss=0.1124, beats_loss=0.009218, ecapa_loss=0.0001701, whisper_loss=0.1015, over 19516.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01094, ecapa_loss=0.0001682, whisper_loss=0.09074, over 3743429.93 frames. ], batch size: 74, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:40:59,855 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 17:41:03,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.398e+01 2.726e+01 3.050e+01 4.286e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-12 17:41:04,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.75 vs. limit=10.0 2024-08-12 17:41:16,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1747280.0, ans=0.125 2024-08-12 17:41:23,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1747380.0, ans=0.125 2024-08-12 17:41:26,181 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 17:41:31,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1747380.0, ans=0.125 2024-08-12 17:41:31,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-12 17:41:33,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1747380.0, ans=0.025 2024-08-12 17:41:42,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1747480.0, ans=0.1 2024-08-12 17:41:51,284 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 850, loss[loss=0.1135, beats_loss=0.00861, ecapa_loss=0.0001756, whisper_loss=0.1031, over 18123.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01098, ecapa_loss=0.0001678, whisper_loss=0.09014, over 3778415.50 frames. ], batch size: 68, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:41:53,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1747580.0, ans=0.125 2024-08-12 17:42:04,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-08-12 17:42:12,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1747680.0, ans=0.0 2024-08-12 17:42:22,076 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 17:42:29,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1747780.0, ans=0.125 2024-08-12 17:42:31,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-08-12 17:43:06,044 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 900, loss[loss=0.1017, beats_loss=0.00983, ecapa_loss=0.0001817, whisper_loss=0.0901, over 16630.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001678, whisper_loss=0.09063, over 3779126.43 frames. ], batch size: 66, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:43:14,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1748080.0, ans=0.125 2024-08-12 17:43:15,227 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-12 17:43:32,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.406e+01 2.653e+01 2.914e+01 6.572e+01, threshold=5.306e+01, percent-clipped=1.0 2024-08-12 17:43:36,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-08-12 17:44:00,000 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 17:44:17,619 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 950, loss[loss=0.1036, beats_loss=0.01036, ecapa_loss=0.0002, whisper_loss=0.09127, over 21962.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001686, whisper_loss=0.09073, over 3807850.01 frames. ], batch size: 88, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:44:30,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=15.0 2024-08-12 17:44:58,838 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 17:45:04,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1748880.0, ans=0.125 2024-08-12 17:45:23,800 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 17:45:27,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1000, loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.0001657, whisper_loss=0.09184, over 21680.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01084, ecapa_loss=0.0001675, whisper_loss=0.08979, over 3783568.05 frames. ], batch size: 87, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:45:34,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=12.0 2024-08-12 17:45:50,859 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 17:45:53,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.479e+01 2.731e+01 3.171e+01 4.511e+01, threshold=5.462e+01, percent-clipped=0.0 2024-08-12 17:45:57,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1749280.0, ans=0.0 2024-08-12 17:46:00,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1749280.0, ans=0.0 2024-08-12 17:46:29,016 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 17:46:33,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2024-08-12 17:46:34,431 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 17:46:41,739 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1050, loss[loss=0.1364, beats_loss=0.006499, ecapa_loss=0.0001428, whisper_loss=0.1285, over 19964.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01075, ecapa_loss=0.0001661, whisper_loss=0.09055, over 3778369.99 frames. ], batch size: 72, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:46:48,185 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 17:46:59,074 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 17:47:10,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2024-08-12 17:47:11,893 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-12 17:47:41,933 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 17:47:42,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1749980.0, ans=0.125 2024-08-12 17:47:49,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1749980.0, ans=0.0 2024-08-12 17:47:57,856 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1100, loss[loss=0.1177, beats_loss=0.01088, ecapa_loss=0.0001726, whisper_loss=0.1051, over 22771.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01077, ecapa_loss=0.000166, whisper_loss=0.09082, over 3775580.80 frames. ], batch size: 90, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:48:07,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1750080.0, ans=0.1 2024-08-12 17:48:15,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1750180.0, ans=0.125 2024-08-12 17:48:16,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1750180.0, ans=0.0 2024-08-12 17:48:24,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.582e+01 2.825e+01 3.154e+01 4.424e+01, threshold=5.651e+01, percent-clipped=0.0 2024-08-12 17:48:42,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1750280.0, ans=0.125 2024-08-12 17:48:48,354 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 17:49:13,619 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 17:49:23,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1150, loss[loss=0.1092, beats_loss=0.01064, ecapa_loss=0.0001561, whisper_loss=0.09696, over 15024.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001661, whisper_loss=0.09097, over 3821914.19 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:49:41,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-08-12 17:49:43,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=1750680.0, ans=0.1 2024-08-12 17:49:49,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-12 17:50:07,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1750780.0, ans=0.2 2024-08-12 17:50:10,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-12 17:50:22,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1750880.0, ans=0.125 2024-08-12 17:50:41,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1750980.0, ans=0.0 2024-08-12 17:50:51,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1200, loss[loss=0.07442, beats_loss=0.0119, ecapa_loss=0.0001842, whisper_loss=0.06068, over 14495.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001665, whisper_loss=0.09097, over 3803449.28 frames. ], batch size: 59, lr: 4.93e-03, grad_scale: 5.764607523034235e+17 2024-08-12 17:50:58,777 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-12 17:51:01,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1751080.0, ans=0.125 2024-08-12 17:51:05,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1751080.0, ans=0.1 2024-08-12 17:51:11,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=15.0 2024-08-12 17:51:12,431 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 17:51:27,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.378e+01 2.599e+01 3.054e+01 4.994e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-12 17:51:37,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1751280.0, ans=0.125 2024-08-12 17:52:23,495 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 17:52:37,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1250, loss[loss=0.1234, beats_loss=0.009198, ecapa_loss=0.0001628, whisper_loss=0.1126, over 16950.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01094, ecapa_loss=0.0001654, whisper_loss=0.09111, over 3796489.12 frames. ], batch size: 64, lr: 4.93e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:52:40,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1751580.0, ans=0.1 2024-08-12 17:52:45,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1751580.0, ans=0.125 2024-08-12 17:53:03,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1751680.0, ans=0.125 2024-08-12 17:53:16,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1751680.0, ans=0.2 2024-08-12 17:53:47,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2024-08-12 17:53:49,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1751880.0, ans=0.125 2024-08-12 17:53:50,834 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 17:53:52,339 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-12 17:53:59,267 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 17:54:16,815 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 17:54:23,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1751980.0, ans=0.0 2024-08-12 17:54:27,512 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1300, loss[loss=0.09217, beats_loss=0.0125, ecapa_loss=0.0001526, whisper_loss=0.07814, over 18881.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01091, ecapa_loss=0.0001668, whisper_loss=0.09097, over 3817062.24 frames. ], batch size: 77, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:55:06,907 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.406e+01 2.650e+01 2.964e+01 4.612e+01, threshold=5.300e+01, percent-clipped=0.0 2024-08-12 17:55:17,054 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 17:55:40,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1752380.0, ans=0.0 2024-08-12 17:56:02,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=22.5 2024-08-12 17:56:10,081 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 17:56:14,972 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1350, loss[loss=0.09083, beats_loss=0.008648, ecapa_loss=0.0001474, whisper_loss=0.08071, over 14784.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.0001675, whisper_loss=0.09122, over 3834122.15 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:56:15,219 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 17:56:38,677 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 17:57:09,221 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-12 17:57:12,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=1752880.0, ans=0.02 2024-08-12 17:57:15,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.06 vs. limit=22.5 2024-08-12 17:57:19,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1752880.0, ans=0.05 2024-08-12 17:57:20,645 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 17:57:38,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1400, loss[loss=0.1027, beats_loss=0.01098, ecapa_loss=0.0001518, whisper_loss=0.09024, over 23225.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.0001684, whisper_loss=0.0909, over 3847941.88 frames. ], batch size: 92, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:57:38,250 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 17:57:45,160 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 14 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 17:57:53,004 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 17:58:04,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.419e+01 2.702e+01 3.143e+01 2.017e+02, threshold=5.404e+01, percent-clipped=3.0 2024-08-12 17:58:36,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1753480.0, ans=0.125 2024-08-12 17:58:42,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1753480.0, ans=0.125 2024-08-12 17:58:43,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-12 17:59:02,985 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1450, loss[loss=0.08465, beats_loss=0.01312, ecapa_loss=0.000141, whisper_loss=0.07012, over 17026.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01088, ecapa_loss=0.0001663, whisper_loss=0.09049, over 3843732.40 frames. ], batch size: 67, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 17:59:03,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1753580.0, ans=0.125 2024-08-12 17:59:14,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1753580.0, ans=0.0 2024-08-12 17:59:16,132 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 13 from Vox, 50 fro AS 2024-08-12 17:59:31,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1753680.0, ans=0.0 2024-08-12 17:59:33,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1753780.0, ans=0.0 2024-08-12 17:59:44,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2024-08-12 17:59:47,499 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 17:59:50,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1753880.0, ans=0.0 2024-08-12 18:00:15,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1753980.0, ans=0.0 2024-08-12 18:00:18,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=1753980.0, ans=0.025 2024-08-12 18:00:21,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1500, loss[loss=0.08102, beats_loss=0.01542, ecapa_loss=0.0001337, whisper_loss=0.06426, over 21091.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01097, ecapa_loss=0.0001669, whisper_loss=0.09005, over 3852819.96 frames. ], batch size: 89, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:00:32,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=1754080.0, ans=0.05 2024-08-12 18:00:38,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2024-08-12 18:00:50,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.459e+01 2.780e+01 3.185e+01 5.902e+01, threshold=5.561e+01, percent-clipped=1.0 2024-08-12 18:01:04,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-12 18:01:08,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1754380.0, ans=0.1 2024-08-12 18:01:41,424 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1550, loss[loss=0.09668, beats_loss=0.0117, ecapa_loss=0.0001472, whisper_loss=0.08351, over 17922.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01101, ecapa_loss=0.0001652, whisper_loss=0.09032, over 3848053.34 frames. ], batch size: 69, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:01:48,140 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 18:02:01,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1754680.0, ans=0.0 2024-08-12 18:02:42,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1754980.0, ans=0.1 2024-08-12 18:02:55,263 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-12 18:02:58,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1600, loss[loss=0.1082, beats_loss=0.00825, ecapa_loss=0.000174, whisper_loss=0.09823, over 14734.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01099, ecapa_loss=0.0001643, whisper_loss=0.09004, over 3860621.44 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:03:04,582 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 18:03:25,478 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.499e+01 2.878e+01 3.295e+01 8.050e+01, threshold=5.757e+01, percent-clipped=1.0 2024-08-12 18:03:28,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1755280.0, ans=0.0 2024-08-12 18:03:44,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-08-12 18:03:47,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1755380.0, ans=0.0 2024-08-12 18:03:55,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1755380.0, ans=0.125 2024-08-12 18:04:07,695 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 18:04:10,642 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-12 18:04:14,005 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1650, loss[loss=0.09771, beats_loss=0.01039, ecapa_loss=0.000157, whisper_loss=0.08575, over 17257.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01102, ecapa_loss=0.0001638, whisper_loss=0.09041, over 3883164.25 frames. ], batch size: 70, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:04:14,204 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 39 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 18:04:34,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=1755680.0, ans=12.0 2024-08-12 18:04:36,959 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 18:04:37,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1755680.0, ans=0.2 2024-08-12 18:05:10,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-12 18:05:20,948 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.091e+01 2024-08-12 18:05:28,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1756080.0, ans=0.125 2024-08-12 18:05:28,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1756080.0, ans=0.1 2024-08-12 18:05:29,028 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1700, loss[loss=0.105, beats_loss=0.01139, ecapa_loss=0.0001674, whisper_loss=0.09191, over 17607.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01088, ecapa_loss=0.000165, whisper_loss=0.09085, over 3865602.27 frames. ], batch size: 69, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:05:30,894 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 18:05:33,714 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:05:36,776 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 18:05:48,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1756180.0, ans=0.0 2024-08-12 18:05:53,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1756180.0, ans=0.0 2024-08-12 18:05:56,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.398e+01 2.715e+01 2.937e+01 4.103e+01, threshold=5.430e+01, percent-clipped=0.0 2024-08-12 18:06:06,496 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-12 18:06:21,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2024-08-12 18:06:35,771 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 18:06:41,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2024-08-12 18:06:42,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1750, loss[loss=0.1113, beats_loss=0.01097, ecapa_loss=0.0001395, whisper_loss=0.09893, over 23676.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01083, ecapa_loss=0.0001654, whisper_loss=0.09128, over 3864971.61 frames. ], batch size: 93, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:06:45,440 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-12 18:06:55,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1756680.0, ans=0.1 2024-08-12 18:06:56,895 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 37 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-12 18:07:04,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1756680.0, ans=0.0 2024-08-12 18:07:05,676 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-12 18:07:15,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1756780.0, ans=0.0 2024-08-12 18:07:18,398 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 20 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-12 18:07:20,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1756780.0, ans=0.125 2024-08-12 18:07:22,942 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 18:07:38,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-12 18:07:55,194 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1800, loss[loss=0.1138, beats_loss=0.008801, ecapa_loss=0.0001809, whisper_loss=0.1032, over 19576.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.000166, whisper_loss=0.09193, over 3870329.49 frames. ], batch size: 78, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:08:02,690 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 18:08:09,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1757180.0, ans=0.1 2024-08-12 18:08:11,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-12 18:08:16,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1757180.0, ans=0.1 2024-08-12 18:08:18,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2024-08-12 18:08:21,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.466e+01 2.734e+01 3.019e+01 6.645e+01, threshold=5.468e+01, percent-clipped=2.0 2024-08-12 18:08:30,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1757280.0, ans=0.0 2024-08-12 18:08:59,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1757480.0, ans=0.0 2024-08-12 18:09:03,588 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 18:09:08,941 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1850, loss[loss=0.09314, beats_loss=0.01231, ecapa_loss=0.0001561, whisper_loss=0.07927, over 20672.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001663, whisper_loss=0.09143, over 3866635.89 frames. ], batch size: 84, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:09:18,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1757580.0, ans=0.0 2024-08-12 18:09:19,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1757580.0, ans=0.125 2024-08-12 18:09:25,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1757680.0, ans=0.0 2024-08-12 18:09:29,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1757680.0, ans=0.0 2024-08-12 18:09:47,782 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 18:09:58,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1757880.0, ans=0.125 2024-08-12 18:09:58,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=12.0 2024-08-12 18:10:00,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-08-12 18:10:01,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1757880.0, ans=0.07 2024-08-12 18:10:07,776 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 18:10:10,304 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-12 18:10:20,662 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1900, loss[loss=0.1063, beats_loss=0.01166, ecapa_loss=0.0001731, whisper_loss=0.09294, over 22592.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001669, whisper_loss=0.09133, over 3868246.45 frames. ], batch size: 89, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:10:30,857 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 18:10:47,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.395e+01 2.725e+01 3.038e+01 6.504e+01, threshold=5.449e+01, percent-clipped=3.0 2024-08-12 18:11:02,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-12 18:11:06,293 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 18:11:06,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1758380.0, ans=0.2 2024-08-12 18:11:34,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 1950, loss[loss=0.1083, beats_loss=0.01158, ecapa_loss=0.0001887, whisper_loss=0.09486, over 17274.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01081, ecapa_loss=0.0001691, whisper_loss=0.09092, over 3856772.74 frames. ], batch size: 70, lr: 4.92e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:11:44,321 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 18:11:47,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1758680.0, ans=0.1 2024-08-12 18:12:19,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1758880.0, ans=0.125 2024-08-12 18:12:24,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-12 18:12:24,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2024-08-12 18:12:30,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1758880.0, ans=0.0 2024-08-12 18:12:48,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2000, loss[loss=0.1106, beats_loss=0.009636, ecapa_loss=0.0001691, whisper_loss=0.09932, over 21379.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01084, ecapa_loss=0.0001686, whisper_loss=0.09089, over 3817903.08 frames. ], batch size: 83, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:12:53,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1759080.0, ans=0.125 2024-08-12 18:12:57,152 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-12 18:12:57,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2024-08-12 18:13:07,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-12 18:13:15,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.512e+01 2.812e+01 3.299e+01 5.299e+01, threshold=5.623e+01, percent-clipped=0.0 2024-08-12 18:13:17,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1759280.0, ans=0.125 2024-08-12 18:13:24,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1759280.0, ans=0.1 2024-08-12 18:13:24,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=12.0 2024-08-12 18:13:26,753 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 17 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 18:13:31,068 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-12 18:13:31,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2024-08-12 18:13:32,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1759380.0, ans=0.0 2024-08-12 18:13:48,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1759480.0, ans=0.1 2024-08-12 18:14:01,998 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2050, loss[loss=0.1014, beats_loss=0.01304, ecapa_loss=0.000128, whisper_loss=0.0871, over 19083.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01088, ecapa_loss=0.0001693, whisper_loss=0.09031, over 3796079.72 frames. ], batch size: 75, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:14:26,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1759680.0, ans=0.125 2024-08-12 18:14:39,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1759780.0, ans=0.125 2024-08-12 18:14:51,024 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 18:15:02,471 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-176000.pt 2024-08-12 18:15:09,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1759980.0, ans=0.125 2024-08-12 18:15:17,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1760080.0, ans=0.125 2024-08-12 18:15:18,219 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2100, loss[loss=0.1176, beats_loss=0.00939, ecapa_loss=0.0001509, whisper_loss=0.1067, over 16784.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01089, ecapa_loss=0.0001695, whisper_loss=0.09029, over 3812741.73 frames. ], batch size: 63, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:15:36,621 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-12 18:15:43,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.436e+01 2.700e+01 3.111e+01 5.079e+01, threshold=5.401e+01, percent-clipped=0.0 2024-08-12 18:15:47,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1760280.0, ans=0.125 2024-08-12 18:16:05,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2024-08-12 18:16:09,223 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 18:16:23,357 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-12 18:16:27,996 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.194e+01 2024-08-12 18:16:30,214 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2150, loss[loss=0.1255, beats_loss=0.01122, ecapa_loss=0.0001337, whisper_loss=0.113, over 20114.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01093, ecapa_loss=0.0001684, whisper_loss=0.09028, over 3842412.90 frames. ], batch size: 75, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:16:32,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1760580.0, ans=0.0 2024-08-12 18:16:36,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-08-12 18:16:47,414 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-12 18:16:57,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1760780.0, ans=0.1 2024-08-12 18:16:57,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1760780.0, ans=0.0 2024-08-12 18:17:05,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1760780.0, ans=0.2 2024-08-12 18:17:23,987 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 18:17:37,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2200, loss[loss=0.1098, beats_loss=0.01156, ecapa_loss=0.0001688, whisper_loss=0.09657, over 22773.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001681, whisper_loss=0.09097, over 3837654.87 frames. ], batch size: 89, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:17:38,779 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 18:17:58,297 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 18:18:00,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.455e+01 2.695e+01 3.002e+01 4.139e+01, threshold=5.389e+01, percent-clipped=0.0 2024-08-12 18:18:07,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-12 18:18:08,246 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-12 18:18:08,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1761280.0, ans=0.125 2024-08-12 18:18:13,546 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 18 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-12 18:18:13,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1761280.0, ans=0.125 2024-08-12 18:18:17,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1761380.0, ans=0.2 2024-08-12 18:18:33,426 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 18:18:35,881 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-12 18:18:42,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2250, loss[loss=0.1102, beats_loss=0.006492, ecapa_loss=0.0002111, whisper_loss=0.1016, over 17869.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001692, whisper_loss=0.0916, over 3824161.83 frames. ], batch size: 70, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:18:48,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2024-08-12 18:19:11,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1761780.0, ans=0.05 2024-08-12 18:19:25,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1761880.0, ans=0.1 2024-08-12 18:19:30,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-12 18:19:32,930 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-12 18:19:46,169 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-12 18:19:47,229 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2300, loss[loss=0.1112, beats_loss=0.0118, ecapa_loss=0.0002072, whisper_loss=0.09736, over 22366.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.0001688, whisper_loss=0.09171, over 3863764.92 frames. ], batch size: 94, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:20:00,413 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 18:20:01,675 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 18:20:03,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1762180.0, ans=10.0 2024-08-12 18:20:10,976 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.458e+01 2.734e+01 3.155e+01 5.696e+01, threshold=5.468e+01, percent-clipped=1.0 2024-08-12 18:20:11,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2024-08-12 18:20:20,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1762280.0, ans=0.0 2024-08-12 18:20:23,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1762280.0, ans=0.125 2024-08-12 18:20:36,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1762380.0, ans=0.0 2024-08-12 18:20:44,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=22.5 2024-08-12 18:20:48,013 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-12 18:20:48,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1762480.0, ans=0.1 2024-08-12 18:20:52,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2350, loss[loss=0.1143, beats_loss=0.0122, ecapa_loss=0.0001706, whisper_loss=0.1004, over 22160.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.00017, whisper_loss=0.09174, over 3833861.51 frames. ], batch size: 91, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:20:53,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1762580.0, ans=0.125 2024-08-12 18:21:15,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1762680.0, ans=0.0 2024-08-12 18:21:19,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1762780.0, ans=0.04949747468305833 2024-08-12 18:21:26,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1762780.0, ans=0.1 2024-08-12 18:21:40,696 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.196e+00 2024-08-12 18:21:54,746 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 18:21:55,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1762980.0, ans=0.125 2024-08-12 18:21:57,322 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 18:21:58,465 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2400, loss[loss=0.08766, beats_loss=0.00993, ecapa_loss=0.0001454, whisper_loss=0.07628, over 15329.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01095, ecapa_loss=0.00017, whisper_loss=0.09221, over 3844515.90 frames. ], batch size: 55, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:22:03,902 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05874495208263397, model_norm_threshold=54.68092727661133 2024-08-12 18:22:04,081 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.484e+05, grad_sumsq=9.566e+04, orig_rms_sq=8.869e+00 2024-08-12 18:22:05,659 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 18:22:11,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1763180.0, ans=0.125 2024-08-12 18:22:22,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.513e+01 2.845e+01 3.166e+01 9.308e+02, threshold=5.690e+01, percent-clipped=1.0 2024-08-12 18:22:28,076 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 18:22:42,478 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 18:22:50,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1763480.0, ans=0.125 2024-08-12 18:22:50,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-12 18:23:01,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1763480.0, ans=0.0 2024-08-12 18:23:04,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2450, loss[loss=0.1038, beats_loss=0.01037, ecapa_loss=0.0001705, whisper_loss=0.09169, over 21937.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01101, ecapa_loss=0.0001705, whisper_loss=0.09107, over 3867227.23 frames. ], batch size: 89, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:23:04,680 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-12 18:23:09,559 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 17 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 18:23:12,521 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 18:23:12,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1763580.0, ans=0.0 2024-08-12 18:23:16,669 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 18:23:35,340 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.440e+01 2024-08-12 18:23:41,614 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 37 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 18:24:03,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=12.0 2024-08-12 18:24:04,900 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-12 18:24:05,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1763980.0, ans=0.1 2024-08-12 18:24:07,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1763980.0, ans=0.125 2024-08-12 18:24:09,638 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2500, loss[loss=0.1123, beats_loss=0.01201, ecapa_loss=0.0001575, whisper_loss=0.09873, over 23693.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01095, ecapa_loss=0.0001709, whisper_loss=0.09135, over 3850580.97 frames. ], batch size: 92, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:24:25,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1764180.0, ans=0.025 2024-08-12 18:24:28,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.25 vs. limit=15.0 2024-08-12 18:24:32,894 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.522e+01 2.839e+01 3.431e+01 9.983e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 18:24:35,879 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 27 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-12 18:24:37,302 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 18:24:45,313 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 18:24:51,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1764380.0, ans=0.125 2024-08-12 18:25:07,557 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 18:25:15,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2550, loss[loss=0.1076, beats_loss=0.00896, ecapa_loss=0.0001704, whisper_loss=0.09694, over 14339.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001715, whisper_loss=0.09204, over 3853506.63 frames. ], batch size: 55, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:25:17,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2024-08-12 18:25:18,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1764580.0, ans=0.0 2024-08-12 18:25:22,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1764580.0, ans=0.125 2024-08-12 18:25:26,254 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 18:25:29,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1764680.0, ans=0.1 2024-08-12 18:25:30,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1764680.0, ans=0.125 2024-08-12 18:25:40,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1764780.0, ans=0.1 2024-08-12 18:25:46,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1764780.0, ans=0.0 2024-08-12 18:25:57,300 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 18:25:59,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1764880.0, ans=0.0 2024-08-12 18:26:05,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1764880.0, ans=0.2 2024-08-12 18:26:07,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1764980.0, ans=0.2 2024-08-12 18:26:17,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2024-08-12 18:26:20,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2600, loss[loss=0.105, beats_loss=0.01379, ecapa_loss=0.0001775, whisper_loss=0.08948, over 21686.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001715, whisper_loss=0.09232, over 3840275.06 frames. ], batch size: 92, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:26:21,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2024-08-12 18:26:32,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1765180.0, ans=10.0 2024-08-12 18:26:41,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1765180.0, ans=0.125 2024-08-12 18:26:43,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.522e+01 2.874e+01 3.178e+01 1.791e+02, threshold=5.747e+01, percent-clipped=2.0 2024-08-12 18:26:43,911 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 18:26:45,556 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 18:26:57,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1765280.0, ans=0.125 2024-08-12 18:26:59,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1765380.0, ans=0.125 2024-08-12 18:27:19,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-12 18:27:20,428 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-12 18:27:25,512 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2650, loss[loss=0.08786, beats_loss=0.01212, ecapa_loss=0.0001827, whisper_loss=0.07391, over 15966.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001701, whisper_loss=0.09234, over 3842123.76 frames. ], batch size: 62, lr: 4.91e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:27:39,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1765680.0, ans=0.05 2024-08-12 18:27:44,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1765680.0, ans=0.125 2024-08-12 18:27:46,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1765680.0, ans=0.1 2024-08-12 18:27:51,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1765780.0, ans=0.1 2024-08-12 18:27:54,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1765780.0, ans=0.0 2024-08-12 18:27:59,724 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 34 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 18:28:22,420 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-12 18:28:28,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1765980.0, ans=0.125 2024-08-12 18:28:29,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1765980.0, ans=0.125 2024-08-12 18:28:31,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2700, loss[loss=0.07687, beats_loss=0.009736, ecapa_loss=0.0001355, whisper_loss=0.06578, over 16190.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001707, whisper_loss=0.09165, over 3835132.60 frames. ], batch size: 61, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:28:34,083 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-12 18:28:54,731 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.345e+01 2.624e+01 3.036e+01 4.476e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 18:29:24,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1766480.0, ans=0.125 2024-08-12 18:29:30,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1766480.0, ans=0.1 2024-08-12 18:29:30,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1766480.0, ans=0.0 2024-08-12 18:29:34,232 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 18:29:36,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2750, loss[loss=0.1133, beats_loss=0.01158, ecapa_loss=0.000152, whisper_loss=0.1002, over 23185.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01097, ecapa_loss=0.0001698, whisper_loss=0.09143, over 3854533.66 frames. ], batch size: 89, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:29:43,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1766580.0, ans=0.125 2024-08-12 18:29:53,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1766680.0, ans=0.125 2024-08-12 18:29:58,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1766680.0, ans=0.0 2024-08-12 18:30:03,860 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-12 18:30:08,986 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 18:30:10,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1766780.0, ans=0.2 2024-08-12 18:30:21,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1766880.0, ans=0.0 2024-08-12 18:30:25,529 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 18:30:32,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=1766980.0, ans=10.0 2024-08-12 18:30:33,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1766980.0, ans=0.125 2024-08-12 18:30:39,676 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 18:30:42,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2800, loss[loss=0.09224, beats_loss=0.01037, ecapa_loss=0.0001875, whisper_loss=0.08, over 17048.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001706, whisper_loss=0.09194, over 3845482.13 frames. ], batch size: 68, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:30:44,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1767080.0, ans=0.1 2024-08-12 18:30:54,192 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 18:30:55,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1767180.0, ans=0.0 2024-08-12 18:30:55,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1767180.0, ans=0.125 2024-08-12 18:31:05,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-12 18:31:06,289 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.517e+01 2.667e+01 2.964e+01 5.320e+01, threshold=5.335e+01, percent-clipped=1.0 2024-08-12 18:31:06,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.02 vs. limit=22.5 2024-08-12 18:31:21,311 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 18:31:24,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1767380.0, ans=0.0 2024-08-12 18:31:30,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1767380.0, ans=0.0 2024-08-12 18:31:42,397 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 18:31:43,750 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 18:31:48,887 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2850, loss[loss=0.1058, beats_loss=0.0116, ecapa_loss=0.0001664, whisper_loss=0.09257, over 23529.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01094, ecapa_loss=0.0001705, whisper_loss=0.09184, over 3840893.07 frames. ], batch size: 93, lr: 4.90e-03, grad_scale: 1.152921504606847e+18 2024-08-12 18:31:50,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1767580.0, ans=0.2 2024-08-12 18:31:51,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1767580.0, ans=0.1 2024-08-12 18:31:58,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.50 vs. limit=10.0 2024-08-12 18:32:08,535 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 18:32:16,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1767780.0, ans=0.0 2024-08-12 18:32:25,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1767780.0, ans=0.125 2024-08-12 18:32:32,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.08 vs. limit=5.0 2024-08-12 18:32:38,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1767880.0, ans=0.2 2024-08-12 18:32:50,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1767980.0, ans=0.0 2024-08-12 18:32:53,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2900, loss[loss=0.1368, beats_loss=0.008725, ecapa_loss=0.0001549, whisper_loss=0.1265, over 19613.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01089, ecapa_loss=0.0001707, whisper_loss=0.09231, over 3880249.80 frames. ], batch size: 75, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:32:59,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1768080.0, ans=0.0 2024-08-12 18:33:06,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1768180.0, ans=0.125 2024-08-12 18:33:13,815 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-12 18:33:19,015 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.482e+01 2.869e+01 3.422e+01 8.599e+01, threshold=5.738e+01, percent-clipped=1.0 2024-08-12 18:33:27,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1768280.0, ans=0.125 2024-08-12 18:33:28,701 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-12 18:33:36,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-12 18:33:39,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1768380.0, ans=0.125 2024-08-12 18:33:45,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1768380.0, ans=0.1 2024-08-12 18:33:45,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1768380.0, ans=0.125 2024-08-12 18:34:00,685 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 2950, loss[loss=0.09379, beats_loss=0.01212, ecapa_loss=0.0001367, whisper_loss=0.0803, over 19363.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001703, whisper_loss=0.09159, over 3856905.09 frames. ], batch size: 75, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:34:29,502 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-12 18:34:32,929 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-12 18:34:45,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1768880.0, ans=0.125 2024-08-12 18:34:49,353 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 18:34:57,208 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-12 18:35:10,461 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3000, loss[loss=0.09856, beats_loss=0.01189, ecapa_loss=0.0001814, whisper_loss=0.08486, over 14088.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01089, ecapa_loss=0.0001714, whisper_loss=0.09276, over 3870713.83 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:35:10,462 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 18:35:46,405 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005879, whisper_loss=0.2492, over 922467.00 frames. 2024-08-12 18:36:04,773 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on SV_voxceleb1: loss=0.004639, beats_loss=0, ecapa_loss=0.0004639, whisper_loss=0, over 939242.00 frames. 2024-08-12 18:36:29,434 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.1689, 1.9057, 1.9581, 1.8835], device='cuda:0') 2024-08-12 18:37:53,559 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on AT_audioset: loss=0.02413, beats_loss=0.02413, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 18:37:53,564 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 18:37:59,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-08-12 18:38:05,451 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 18:38:10,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=12.0 2024-08-12 18:38:11,670 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 18:38:18,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.438e+01 2.713e+01 3.016e+01 4.001e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-12 18:38:56,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1769480.0, ans=0.125 2024-08-12 18:38:59,966 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3050, loss[loss=0.1332, beats_loss=0.008051, ecapa_loss=0.0001955, whisper_loss=0.1232, over 24008.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01085, ecapa_loss=0.0001731, whisper_loss=0.09317, over 3896251.61 frames. ], batch size: 93, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:39:00,084 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 18:39:01,587 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 18:39:05,873 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 18:39:10,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2024-08-12 18:39:29,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1769780.0, ans=0.1 2024-08-12 18:39:33,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1769780.0, ans=0.0 2024-08-12 18:39:59,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.52 vs. limit=15.0 2024-08-12 18:40:09,373 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3100, loss[loss=0.1152, beats_loss=0.01101, ecapa_loss=0.0001593, whisper_loss=0.1026, over 18632.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01092, ecapa_loss=0.0001735, whisper_loss=0.09307, over 3924353.18 frames. ], batch size: 74, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:40:10,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1770080.0, ans=0.125 2024-08-12 18:40:20,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2024-08-12 18:40:27,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.64 vs. limit=22.5 2024-08-12 18:40:34,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2024-08-12 18:40:36,326 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.048e+01 2.497e+01 2.868e+01 3.286e+01 7.289e+01, threshold=5.737e+01, percent-clipped=2.0 2024-08-12 18:40:38,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1770280.0, ans=0.1 2024-08-12 18:40:44,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2024-08-12 18:41:15,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1770480.0, ans=0.125 2024-08-12 18:41:21,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3150, loss[loss=0.087, beats_loss=0.01151, ecapa_loss=0.0001356, whisper_loss=0.07413, over 16310.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01102, ecapa_loss=0.0001723, whisper_loss=0.09165, over 3914330.18 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:41:35,700 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 14 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-12 18:41:48,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1770780.0, ans=0.125 2024-08-12 18:42:00,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1770780.0, ans=0.0 2024-08-12 18:42:15,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1770880.0, ans=0.125 2024-08-12 18:42:18,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1770980.0, ans=0.125 2024-08-12 18:42:21,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1770980.0, ans=0.125 2024-08-12 18:42:25,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1770980.0, ans=0.125 2024-08-12 18:42:34,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3200, loss[loss=0.07949, beats_loss=0.01247, ecapa_loss=0.0002086, whisper_loss=0.06493, over 20811.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01101, ecapa_loss=0.0001743, whisper_loss=0.09161, over 3893785.45 frames. ], batch size: 89, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:43:02,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.434e+01 2.699e+01 3.191e+01 8.641e+01, threshold=5.397e+01, percent-clipped=3.0 2024-08-12 18:43:13,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1771280.0, ans=0.0 2024-08-12 18:43:14,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1771280.0, ans=0.125 2024-08-12 18:43:32,681 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 23 from Vox, 17 fro AS 2024-08-12 18:43:32,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1771480.0, ans=0.5 2024-08-12 18:43:46,073 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3250, loss[loss=0.1118, beats_loss=0.01018, ecapa_loss=0.000182, whisper_loss=0.09982, over 19416.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01094, ecapa_loss=0.0001741, whisper_loss=0.09197, over 3896056.31 frames. ], batch size: 78, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:43:46,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1771580.0, ans=0.125 2024-08-12 18:44:05,019 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:44:30,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1771880.0, ans=0.1 2024-08-12 18:44:44,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1771980.0, ans=0.0 2024-08-12 18:44:54,678 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 11 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-12 18:44:58,868 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3300, loss[loss=0.08911, beats_loss=0.01333, ecapa_loss=0.0001591, whisper_loss=0.07419, over 22206.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001751, whisper_loss=0.09155, over 3903337.83 frames. ], batch size: 90, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:45:26,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.521e+01 2.800e+01 3.274e+01 5.621e+01, threshold=5.601e+01, percent-clipped=1.0 2024-08-12 18:45:28,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1772280.0, ans=0.125 2024-08-12 18:45:38,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1772280.0, ans=0.125 2024-08-12 18:45:41,480 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.406e-01 2024-08-12 18:45:42,496 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 18:45:58,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1772480.0, ans=0.1 2024-08-12 18:46:01,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1772480.0, ans=0.0 2024-08-12 18:46:10,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1772580.0, ans=0.125 2024-08-12 18:46:11,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3350, loss[loss=0.1122, beats_loss=0.01112, ecapa_loss=0.0001605, whisper_loss=0.09946, over 22281.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01094, ecapa_loss=0.0001745, whisper_loss=0.09206, over 3914568.59 frames. ], batch size: 90, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:46:12,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1772580.0, ans=0.0 2024-08-12 18:46:32,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1772680.0, ans=0.125 2024-08-12 18:46:52,648 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 18:47:05,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1772880.0, ans=0.0 2024-08-12 18:47:10,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1772980.0, ans=0.125 2024-08-12 18:47:14,239 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-12 18:47:22,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3400, loss[loss=0.1083, beats_loss=0.01194, ecapa_loss=0.0001153, whisper_loss=0.09517, over 19123.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.011, ecapa_loss=0.0001735, whisper_loss=0.09167, over 3913371.61 frames. ], batch size: 72, lr: 4.90e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:47:27,467 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-12 18:47:31,535 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 18:47:35,914 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 18:47:41,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1773180.0, ans=0.1 2024-08-12 18:47:49,467 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 18:47:50,751 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.407e+01 2.669e+01 3.067e+01 7.735e+01, threshold=5.339e+01, percent-clipped=1.0 2024-08-12 18:47:58,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-08-12 18:48:06,689 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-12 18:48:10,959 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 21 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-12 18:48:13,897 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:48:21,728 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:48:28,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1773480.0, ans=0.1 2024-08-12 18:48:36,965 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3450, loss[loss=0.09352, beats_loss=0.01357, ecapa_loss=0.0001588, whisper_loss=0.07835, over 21462.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.011, ecapa_loss=0.0001739, whisper_loss=0.0916, over 3909374.51 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:49:28,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-12 18:49:36,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-12 18:49:41,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1773980.0, ans=0.0 2024-08-12 18:49:46,383 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 18:49:47,613 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3500, loss[loss=0.1253, beats_loss=0.008282, ecapa_loss=0.0001928, whisper_loss=0.1151, over 16184.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001743, whisper_loss=0.09136, over 3912199.53 frames. ], batch size: 60, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:49:47,851 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 18:50:14,608 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.564e+01 2.746e+01 3.042e+01 5.198e+01, threshold=5.491e+01, percent-clipped=0.0 2024-08-12 18:50:33,944 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 18:50:44,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1774480.0, ans=0.125 2024-08-12 18:50:48,873 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-12 18:50:58,423 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3550, loss[loss=0.1085, beats_loss=0.009291, ecapa_loss=0.0001833, whisper_loss=0.09741, over 19260.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01095, ecapa_loss=0.0001754, whisper_loss=0.09117, over 3907587.11 frames. ], batch size: 76, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:51:11,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1774680.0, ans=0.1 2024-08-12 18:51:15,995 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 18:51:36,204 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 18:51:53,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1774880.0, ans=0.09899494936611666 2024-08-12 18:51:54,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1774880.0, ans=0.125 2024-08-12 18:52:11,216 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3600, loss[loss=0.1038, beats_loss=0.01047, ecapa_loss=0.0001988, whisper_loss=0.09132, over 13531.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001746, whisper_loss=0.09197, over 3889097.70 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:52:27,935 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 18:52:37,437 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 18:52:37,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1775180.0, ans=0.0 2024-08-12 18:52:38,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.432e+01 2.743e+01 3.098e+01 5.002e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-12 18:52:55,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.92 vs. limit=10.0 2024-08-12 18:53:00,800 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 18:53:12,136 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-12 18:53:23,525 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3650, loss[loss=0.09787, beats_loss=0.01001, ecapa_loss=0.0001395, whisper_loss=0.08646, over 19439.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001735, whisper_loss=0.09163, over 3889670.60 frames. ], batch size: 71, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:53:29,460 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 18:53:37,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-08-12 18:53:39,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1775680.0, ans=0.0 2024-08-12 18:53:52,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1775780.0, ans=0.125 2024-08-12 18:54:09,026 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 22 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-12 18:54:33,095 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 18:54:36,024 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3700, loss[loss=0.1118, beats_loss=0.01209, ecapa_loss=0.0001503, whisper_loss=0.09821, over 22444.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01095, ecapa_loss=0.0001728, whisper_loss=0.09139, over 3896372.76 frames. ], batch size: 89, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:54:36,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1776080.0, ans=0.0 2024-08-12 18:54:41,643 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 18:54:46,152 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 18:54:55,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2024-08-12 18:55:03,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.392e+01 2.654e+01 3.110e+01 5.350e+01, threshold=5.308e+01, percent-clipped=0.0 2024-08-12 18:55:16,446 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-12 18:55:20,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1776380.0, ans=0.0 2024-08-12 18:55:24,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1776380.0, ans=0.1 2024-08-12 18:55:48,257 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3750, loss[loss=0.1113, beats_loss=0.01125, ecapa_loss=0.0001584, whisper_loss=0.09845, over 21994.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01101, ecapa_loss=0.0001728, whisper_loss=0.09118, over 3881982.08 frames. ], batch size: 88, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:55:51,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1776580.0, ans=0.1 2024-08-12 18:56:01,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1776680.0, ans=0.125 2024-08-12 18:56:20,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1776780.0, ans=0.1 2024-08-12 18:56:31,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1776780.0, ans=0.125 2024-08-12 18:57:03,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3800, loss[loss=0.09237, beats_loss=0.01351, ecapa_loss=0.0001507, whisper_loss=0.07736, over 22709.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01107, ecapa_loss=0.0001736, whisper_loss=0.09043, over 3871345.63 frames. ], batch size: 91, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:57:03,416 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 18:57:03,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1777080.0, ans=0.125 2024-08-12 18:57:03,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1777080.0, ans=0.2 2024-08-12 18:57:07,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1777080.0, ans=0.0 2024-08-12 18:57:10,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1777080.0, ans=0.125 2024-08-12 18:57:31,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.486e+01 2.799e+01 3.183e+01 6.177e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 18:57:33,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1777280.0, ans=0.0 2024-08-12 18:57:36,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1777280.0, ans=0.0 2024-08-12 18:57:38,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1777280.0, ans=0.1 2024-08-12 18:57:38,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1777280.0, ans=0.125 2024-08-12 18:57:44,379 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 18:57:45,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1777280.0, ans=0.125 2024-08-12 18:58:09,290 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 18:58:19,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=12.0 2024-08-12 18:58:19,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3850, loss[loss=0.09845, beats_loss=0.01108, ecapa_loss=0.0001702, whisper_loss=0.08567, over 21734.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01106, ecapa_loss=0.0001731, whisper_loss=0.0907, over 3852225.14 frames. ], batch size: 89, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:58:44,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1777680.0, ans=0.125 2024-08-12 18:58:47,762 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 18:58:48,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1777680.0, ans=0.2 2024-08-12 18:58:48,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-08-12 18:59:21,770 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-12 18:59:36,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3900, loss[loss=0.1134, beats_loss=0.01018, ecapa_loss=0.0001791, whisper_loss=0.1014, over 15630.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01114, ecapa_loss=0.0001726, whisper_loss=0.09054, over 3880792.13 frames. ], batch size: 62, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 18:59:42,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1778080.0, ans=0.125 2024-08-12 18:59:45,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1778080.0, ans=0.1 2024-08-12 19:00:02,669 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 19:00:05,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.460e+01 2.720e+01 3.134e+01 5.284e+01, threshold=5.440e+01, percent-clipped=0.0 2024-08-12 19:00:19,987 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 19:00:34,725 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-12 19:00:37,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-12 19:00:42,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2024-08-12 19:00:50,694 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 10 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-12 19:00:53,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 3950, loss[loss=0.09605, beats_loss=0.01194, ecapa_loss=0.0001905, whisper_loss=0.08221, over 21795.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0111, ecapa_loss=0.0001742, whisper_loss=0.09108, over 3896291.28 frames. ], batch size: 90, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:01:10,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1778680.0, ans=0.1 2024-08-12 19:01:16,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1778680.0, ans=0.2 2024-08-12 19:01:24,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1778780.0, ans=0.2 2024-08-12 19:01:26,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2024-08-12 19:01:39,449 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 19:01:51,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1778880.0, ans=0.125 2024-08-12 19:01:54,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1778980.0, ans=0.125 2024-08-12 19:02:06,304 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 19:02:08,806 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4000, loss[loss=0.09958, beats_loss=0.01321, ecapa_loss=0.0001901, whisper_loss=0.08446, over 21733.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01102, ecapa_loss=0.0001761, whisper_loss=0.09121, over 3895536.27 frames. ], batch size: 90, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:02:20,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1779080.0, ans=0.07 2024-08-12 19:02:38,516 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-12 19:02:39,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.414e+01 2.670e+01 2.988e+01 4.666e+01, threshold=5.339e+01, percent-clipped=0.0 2024-08-12 19:02:41,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1779280.0, ans=0.0 2024-08-12 19:02:43,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1779280.0, ans=0.2 2024-08-12 19:02:47,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1779280.0, ans=0.125 2024-08-12 19:03:04,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1779380.0, ans=0.125 2024-08-12 19:03:17,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1779480.0, ans=0.125 2024-08-12 19:03:28,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1779580.0, ans=0.125 2024-08-12 19:03:29,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4050, loss[loss=0.1004, beats_loss=0.01119, ecapa_loss=0.0002236, whisper_loss=0.08701, over 16959.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001763, whisper_loss=0.09152, over 3883260.02 frames. ], batch size: 70, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:03:48,716 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-12 19:04:02,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1779780.0, ans=0.0 2024-08-12 19:04:02,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1779780.0, ans=0.125 2024-08-12 19:04:04,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1779780.0, ans=0.125 2024-08-12 19:04:14,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1779880.0, ans=0.125 2024-08-12 19:04:18,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.82 vs. limit=10.0 2024-08-12 19:04:48,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4100, loss[loss=0.09745, beats_loss=0.009928, ecapa_loss=0.0001644, whisper_loss=0.08588, over 18536.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01091, ecapa_loss=0.0001749, whisper_loss=0.09163, over 3881016.46 frames. ], batch size: 71, lr: 4.89e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:04:57,688 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 19:04:59,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-12 19:05:04,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=15.0 2024-08-12 19:05:08,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1780180.0, ans=0.1 2024-08-12 19:05:15,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-12 19:05:16,936 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.487e+01 2.905e+01 3.188e+01 5.523e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-12 19:05:23,644 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-12 19:05:40,635 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 19:05:53,236 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 19:06:00,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1780480.0, ans=0.0 2024-08-12 19:06:05,885 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-12 19:06:07,553 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4150, loss[loss=0.09303, beats_loss=0.01068, ecapa_loss=0.0002049, whisper_loss=0.0803, over 21159.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01097, ecapa_loss=0.0001761, whisper_loss=0.09069, over 3877446.07 frames. ], batch size: 89, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:06:13,941 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 16 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 19:06:52,093 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-12 19:06:56,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=22.5 2024-08-12 19:06:57,405 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 19:07:02,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1780880.0, ans=0.125 2024-08-12 19:07:22,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1780980.0, ans=0.125 2024-08-12 19:07:26,779 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4200, loss[loss=0.1055, beats_loss=0.01196, ecapa_loss=0.0001595, whisper_loss=0.09196, over 22936.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001757, whisper_loss=0.09151, over 3921270.08 frames. ], batch size: 91, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:07:30,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1781080.0, ans=0.1 2024-08-12 19:07:33,701 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 19:07:36,817 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-12 19:07:49,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1781180.0, ans=0.0 2024-08-12 19:07:52,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1781180.0, ans=0.1 2024-08-12 19:07:52,405 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.833e-02 2024-08-12 19:07:56,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.440e+01 2.909e+01 3.594e+01 1.116e+02, threshold=5.819e+01, percent-clipped=3.0 2024-08-12 19:08:02,913 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 19:08:28,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-12 19:08:47,072 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 19:08:49,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4250, loss[loss=0.1044, beats_loss=0.009789, ecapa_loss=0.0002308, whisper_loss=0.09227, over 19304.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001753, whisper_loss=0.09151, over 3918203.11 frames. ], batch size: 81, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:08:58,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1781580.0, ans=0.125 2024-08-12 19:09:02,561 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-12 19:09:24,669 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 17 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-12 19:09:26,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1781780.0, ans=0.1 2024-08-12 19:09:52,362 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 30 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 19:09:54,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2024-08-12 19:09:56,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1781980.0, ans=0.0 2024-08-12 19:10:08,383 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4300, loss[loss=0.08831, beats_loss=0.009532, ecapa_loss=0.0001899, whisper_loss=0.07688, over 16727.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01089, ecapa_loss=0.0001757, whisper_loss=0.09237, over 3899581.53 frames. ], batch size: 69, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:10:08,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1782080.0, ans=0.2 2024-08-12 19:10:28,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1782180.0, ans=0.2 2024-08-12 19:10:32,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-12 19:10:37,763 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.369e+01 2.676e+01 2.998e+01 4.612e+01, threshold=5.352e+01, percent-clipped=0.0 2024-08-12 19:10:39,857 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 19:10:54,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1782380.0, ans=0.09899494936611666 2024-08-12 19:11:09,795 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 19:11:11,299 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 19:11:18,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1782480.0, ans=0.125 2024-08-12 19:11:23,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1782480.0, ans=0.125 2024-08-12 19:11:25,681 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 19:11:27,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4350, loss[loss=0.1124, beats_loss=0.01066, ecapa_loss=0.0001356, whisper_loss=0.1004, over 19680.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01082, ecapa_loss=0.000176, whisper_loss=0.09222, over 3862498.27 frames. ], batch size: 74, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:11:32,119 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-12 19:11:33,464 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 19:11:41,395 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-12 19:12:13,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1782880.0, ans=0.125 2024-08-12 19:12:44,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1782980.0, ans=0.125 2024-08-12 19:12:47,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-08-12 19:12:49,823 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4400, loss[loss=0.1036, beats_loss=0.01211, ecapa_loss=0.0001513, whisper_loss=0.08994, over 23270.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001745, whisper_loss=0.09163, over 3898049.61 frames. ], batch size: 92, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:13:00,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1783080.0, ans=0.2 2024-08-12 19:13:00,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1783080.0, ans=0.125 2024-08-12 19:13:21,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.411e+01 2.660e+01 2.962e+01 4.713e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-12 19:13:25,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1783280.0, ans=0.125 2024-08-12 19:13:36,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1783280.0, ans=0.125 2024-08-12 19:13:47,255 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 19:14:07,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1783480.0, ans=0.125 2024-08-12 19:14:07,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1783480.0, ans=0.125 2024-08-12 19:14:13,482 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4450, loss[loss=0.1011, beats_loss=0.0111, ecapa_loss=0.0001596, whisper_loss=0.08843, over 17362.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01083, ecapa_loss=0.0001739, whisper_loss=0.09187, over 3874724.73 frames. ], batch size: 68, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:14:13,742 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 19:14:15,210 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-12 19:14:18,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2024-08-12 19:14:37,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-12 19:14:38,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1783680.0, ans=0.125 2024-08-12 19:14:44,652 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 19:15:33,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1783980.0, ans=0.0 2024-08-12 19:15:41,531 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4500, loss[loss=0.1191, beats_loss=0.008354, ecapa_loss=0.0001766, whisper_loss=0.109, over 22867.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01085, ecapa_loss=0.0001739, whisper_loss=0.09122, over 3852711.31 frames. ], batch size: 91, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:15:57,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1784180.0, ans=0.125 2024-08-12 19:16:13,382 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.482e+01 2.920e+01 3.537e+01 6.104e+01, threshold=5.841e+01, percent-clipped=3.0 2024-08-12 19:16:21,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-08-12 19:16:22,205 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 19:17:07,603 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4550, loss[loss=0.127, beats_loss=0.00813, ecapa_loss=0.0001729, whisper_loss=0.1171, over 21621.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001733, whisper_loss=0.09125, over 3869958.79 frames. ], batch size: 85, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:17:09,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1784580.0, ans=0.0 2024-08-12 19:17:17,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1784580.0, ans=0.125 2024-08-12 19:17:22,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1784580.0, ans=15.0 2024-08-12 19:18:11,048 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 19:18:24,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1784980.0, ans=0.0 2024-08-12 19:18:33,607 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4600, loss[loss=0.117, beats_loss=0.009761, ecapa_loss=0.0001529, whisper_loss=0.1057, over 14330.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001733, whisper_loss=0.09173, over 3880879.89 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:19:00,818 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 19:19:01,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1785180.0, ans=0.125 2024-08-12 19:19:04,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.452e+01 2.765e+01 3.164e+01 4.953e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-12 19:19:07,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1785280.0, ans=0.125 2024-08-12 19:19:13,177 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 25 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-12 19:19:19,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1785380.0, ans=0.1 2024-08-12 19:19:25,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1785380.0, ans=0.0 2024-08-12 19:19:33,327 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 19:19:33,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1785380.0, ans=0.1 2024-08-12 19:19:38,464 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 19:19:40,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1785480.0, ans=0.125 2024-08-12 19:19:52,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4650, loss[loss=0.09734, beats_loss=0.01228, ecapa_loss=0.0001748, whisper_loss=0.08332, over 21754.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.0001726, whisper_loss=0.0909, over 3863757.43 frames. ], batch size: 86, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:19:53,758 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 19:19:55,045 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-12 19:19:57,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-12 19:20:10,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1785680.0, ans=0.125 2024-08-12 19:20:17,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1785680.0, ans=0.1 2024-08-12 19:20:27,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1785780.0, ans=0.0 2024-08-12 19:20:50,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1785880.0, ans=0.0 2024-08-12 19:21:11,581 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 19:21:12,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4700, loss[loss=0.09214, beats_loss=0.009908, ecapa_loss=0.00015, whisper_loss=0.08073, over 17766.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001722, whisper_loss=0.09161, over 3881044.41 frames. ], batch size: 70, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:21:26,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1786080.0, ans=0.0 2024-08-12 19:21:43,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.537e+01 2.789e+01 3.116e+01 4.712e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-12 19:22:08,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1786380.0, ans=0.125 2024-08-12 19:22:09,801 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-12 19:22:19,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1786480.0, ans=0.0 2024-08-12 19:22:25,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1786480.0, ans=0.125 2024-08-12 19:22:32,899 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4750, loss[loss=0.1095, beats_loss=0.0101, ecapa_loss=0.0002011, whisper_loss=0.09734, over 22374.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001715, whisper_loss=0.09149, over 3889329.56 frames. ], batch size: 96, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:23:04,332 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 19:23:07,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.09 vs. limit=6.0 2024-08-12 19:23:08,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1786780.0, ans=0.5 2024-08-12 19:23:11,436 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 19:23:14,908 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 32 from Vox, 41 fro AS 2024-08-12 19:23:27,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1786880.0, ans=0.125 2024-08-12 19:23:29,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1786880.0, ans=0.0 2024-08-12 19:23:41,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1786980.0, ans=0.1 2024-08-12 19:23:50,888 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4800, loss[loss=0.1255, beats_loss=0.009824, ecapa_loss=0.0001883, whisper_loss=0.1137, over 22724.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01089, ecapa_loss=0.0001737, whisper_loss=0.09148, over 3888333.79 frames. ], batch size: 88, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:24:19,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1787180.0, ans=0.125 2024-08-12 19:24:20,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.537e+01 2.789e+01 3.212e+01 6.421e+01, threshold=5.577e+01, percent-clipped=1.0 2024-08-12 19:24:26,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1787280.0, ans=0.125 2024-08-12 19:24:41,791 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 19:25:09,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1787580.0, ans=0.1 2024-08-12 19:25:10,356 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4850, loss[loss=0.1194, beats_loss=0.01162, ecapa_loss=0.0002043, whisper_loss=0.1058, over 20458.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.011, ecapa_loss=0.0001723, whisper_loss=0.09114, over 3886350.60 frames. ], batch size: 83, lr: 4.88e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:25:20,130 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 19:25:35,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1787680.0, ans=0.125 2024-08-12 19:25:37,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1787680.0, ans=0.0 2024-08-12 19:25:38,077 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 19:25:39,768 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-12 19:26:01,120 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 19:26:10,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-08-12 19:26:13,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2024-08-12 19:26:17,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=12.0 2024-08-12 19:26:28,999 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-12 19:26:35,121 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4900, loss[loss=0.1094, beats_loss=0.009915, ecapa_loss=0.0001561, whisper_loss=0.09788, over 16403.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01099, ecapa_loss=0.0001707, whisper_loss=0.0922, over 3924483.17 frames. ], batch size: 64, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:26:41,657 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-12 19:26:52,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1788180.0, ans=0.0 2024-08-12 19:27:01,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1788180.0, ans=0.125 2024-08-12 19:27:06,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.493e+01 2.714e+01 3.066e+01 4.979e+01, threshold=5.428e+01, percent-clipped=0.0 2024-08-12 19:27:11,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2024-08-12 19:27:14,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1788280.0, ans=0.0 2024-08-12 19:27:17,697 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 19:27:28,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1788380.0, ans=0.125 2024-08-12 19:27:35,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1788380.0, ans=0.0 2024-08-12 19:27:48,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-12 19:27:53,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1788480.0, ans=0.125 2024-08-12 19:27:55,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1788580.0, ans=0.125 2024-08-12 19:27:56,331 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 4950, loss[loss=0.1312, beats_loss=0.008725, ecapa_loss=0.0001792, whisper_loss=0.1207, over 19923.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01099, ecapa_loss=0.0001711, whisper_loss=0.09154, over 3917902.81 frames. ], batch size: 77, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:28:25,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.14 vs. limit=15.0 2024-08-12 19:28:27,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-12 19:28:28,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1788780.0, ans=0.125 2024-08-12 19:28:58,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1788980.0, ans=0.125 2024-08-12 19:29:15,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5000, loss[loss=0.1212, beats_loss=0.007832, ecapa_loss=0.0001671, whisper_loss=0.1117, over 15283.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001701, whisper_loss=0.09151, over 3879868.12 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:29:31,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.49 vs. limit=15.0 2024-08-12 19:29:40,990 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 27 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 19:29:41,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1789180.0, ans=0.125 2024-08-12 19:29:47,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.488e+01 2.839e+01 3.204e+01 5.431e+01, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 19:30:12,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1789380.0, ans=0.125 2024-08-12 19:30:13,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1789380.0, ans=0.0 2024-08-12 19:30:26,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-08-12 19:30:30,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-08-12 19:30:38,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5050, loss[loss=0.1055, beats_loss=0.009417, ecapa_loss=0.0001585, whisper_loss=0.09455, over 18159.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01096, ecapa_loss=0.0001711, whisper_loss=0.09252, over 3889089.87 frames. ], batch size: 70, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:30:47,538 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 19:30:49,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-08-12 19:30:57,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1789680.0, ans=0.2 2024-08-12 19:31:02,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1789680.0, ans=0.2 2024-08-12 19:31:14,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1789780.0, ans=0.125 2024-08-12 19:31:20,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-08-12 19:31:31,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=12.0 2024-08-12 19:31:42,755 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-12 19:31:44,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1789980.0, ans=0.125 2024-08-12 19:31:48,108 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 19:31:52,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1790080.0, ans=0.0 2024-08-12 19:31:54,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5100, loss[loss=0.1181, beats_loss=0.01015, ecapa_loss=0.0001733, whisper_loss=0.1062, over 19872.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01105, ecapa_loss=0.000171, whisper_loss=0.09209, over 3899220.83 frames. ], batch size: 79, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:31:59,564 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 19:32:02,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1790080.0, ans=0.125 2024-08-12 19:32:03,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1790080.0, ans=0.125 2024-08-12 19:32:09,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-08-12 19:32:20,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.471e+01 2.779e+01 3.135e+01 9.153e+01, threshold=5.559e+01, percent-clipped=1.0 2024-08-12 19:32:23,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1790280.0, ans=0.125 2024-08-12 19:33:02,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5150, loss[loss=0.105, beats_loss=0.008992, ecapa_loss=0.0001923, whisper_loss=0.09411, over 15707.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01101, ecapa_loss=0.0001716, whisper_loss=0.09233, over 3884732.37 frames. ], batch size: 60, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:33:10,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1790580.0, ans=0.0 2024-08-12 19:33:23,169 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-12 19:33:26,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1790680.0, ans=0.125 2024-08-12 19:33:39,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1790780.0, ans=0.125 2024-08-12 19:33:40,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1790780.0, ans=0.125 2024-08-12 19:33:41,965 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 19:33:42,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1790880.0, ans=0.125 2024-08-12 19:33:59,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1790980.0, ans=0.1 2024-08-12 19:34:10,398 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5200, loss[loss=0.1272, beats_loss=0.008864, ecapa_loss=0.0001357, whisper_loss=0.117, over 20145.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001711, whisper_loss=0.0921, over 3872296.42 frames. ], batch size: 74, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:34:12,199 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=8.133e-03 2024-08-12 19:34:17,324 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 19:34:25,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1791180.0, ans=0.125 2024-08-12 19:34:36,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.499e+01 2.713e+01 3.001e+01 1.517e+02, threshold=5.426e+01, percent-clipped=1.0 2024-08-12 19:35:09,501 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-12 19:35:12,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2024-08-12 19:35:19,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5250, loss[loss=0.1158, beats_loss=0.01263, ecapa_loss=0.0001564, whisper_loss=0.1016, over 20254.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01092, ecapa_loss=0.0001715, whisper_loss=0.09257, over 3842491.84 frames. ], batch size: 82, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:35:56,953 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 19:36:24,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1791980.0, ans=0.125 2024-08-12 19:36:28,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5300, loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001814, whisper_loss=0.09124, over 20459.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01083, ecapa_loss=0.000172, whisper_loss=0.09325, over 3841075.60 frames. ], batch size: 80, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:36:54,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.416e+01 2.797e+01 3.236e+01 7.041e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 19:36:57,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1792280.0, ans=0.0 2024-08-12 19:37:25,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1792480.0, ans=0.125 2024-08-12 19:37:31,822 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 19:37:32,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1792480.0, ans=0.1 2024-08-12 19:37:34,517 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 19:37:35,826 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5350, loss[loss=0.1155, beats_loss=0.00933, ecapa_loss=0.0001877, whisper_loss=0.1043, over 19212.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.0001715, whisper_loss=0.09271, over 3804642.01 frames. ], batch size: 77, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:37:58,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1792680.0, ans=0.125 2024-08-12 19:38:00,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1792680.0, ans=0.07 2024-08-12 19:38:08,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.08 vs. limit=22.5 2024-08-12 19:38:09,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1792780.0, ans=0.125 2024-08-12 19:38:30,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1792980.0, ans=0.125 2024-08-12 19:38:44,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5400, loss[loss=0.08728, beats_loss=0.01056, ecapa_loss=0.0001556, whisper_loss=0.07517, over 16089.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01085, ecapa_loss=0.0001734, whisper_loss=0.09256, over 3798874.25 frames. ], batch size: 63, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:39:09,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.477e+01 2.760e+01 3.199e+01 8.149e+01, threshold=5.520e+01, percent-clipped=2.0 2024-08-12 19:39:11,326 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-12 19:39:46,532 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 19:39:47,739 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 19:39:47,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1793480.0, ans=0.0 2024-08-12 19:39:49,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1793480.0, ans=0.125 2024-08-12 19:39:52,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1793580.0, ans=0.125 2024-08-12 19:39:53,575 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5450, loss[loss=0.09454, beats_loss=0.0129, ecapa_loss=0.0001669, whisper_loss=0.07998, over 14804.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.000173, whisper_loss=0.0916, over 3805965.75 frames. ], batch size: 62, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:39:56,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1793580.0, ans=0.0 2024-08-12 19:39:58,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1793580.0, ans=0.1 2024-08-12 19:40:02,841 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-12 19:40:11,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1793680.0, ans=0.125 2024-08-12 19:40:12,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-12 19:40:28,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1793780.0, ans=0.05 2024-08-12 19:40:52,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1793980.0, ans=0.2 2024-08-12 19:41:00,836 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 19:41:06,634 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5500, loss[loss=0.1202, beats_loss=0.00781, ecapa_loss=0.0001853, whisper_loss=0.1106, over 21271.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01093, ecapa_loss=0.0001735, whisper_loss=0.0921, over 3832368.62 frames. ], batch size: 82, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:41:06,909 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 19:41:11,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1794080.0, ans=0.2 2024-08-12 19:41:12,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1794080.0, ans=0.125 2024-08-12 19:41:15,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1794080.0, ans=0.125 2024-08-12 19:41:22,182 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-12 19:41:22,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1794180.0, ans=0.07 2024-08-12 19:41:32,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.427e+01 2.827e+01 3.059e+01 4.853e+01, threshold=5.654e+01, percent-clipped=0.0 2024-08-12 19:41:54,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1794380.0, ans=0.125 2024-08-12 19:41:54,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1794380.0, ans=0.125 2024-08-12 19:42:06,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1794480.0, ans=0.0 2024-08-12 19:42:10,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1794480.0, ans=0.5 2024-08-12 19:42:24,260 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5550, loss[loss=0.1121, beats_loss=0.009359, ecapa_loss=0.0001601, whisper_loss=0.1011, over 23094.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01087, ecapa_loss=0.0001731, whisper_loss=0.09292, over 3881111.66 frames. ], batch size: 91, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:42:26,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1794580.0, ans=0.1 2024-08-12 19:42:35,324 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 19:42:38,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1794580.0, ans=0.05 2024-08-12 19:42:42,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1794680.0, ans=0.125 2024-08-12 19:42:48,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-12 19:43:00,492 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 19:43:22,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-08-12 19:43:30,372 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 19:43:47,257 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 19:43:47,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1794980.0, ans=0.1 2024-08-12 19:43:49,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5600, loss[loss=0.1034, beats_loss=0.01101, ecapa_loss=0.0001432, whisper_loss=0.09098, over 17302.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01083, ecapa_loss=0.0001716, whisper_loss=0.09335, over 3926949.21 frames. ], batch size: 67, lr: 4.87e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:43:50,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1795080.0, ans=0.125 2024-08-12 19:44:24,203 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.502e+01 2.768e+01 3.142e+01 4.658e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-12 19:44:45,666 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-12 19:44:54,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1795380.0, ans=0.035 2024-08-12 19:45:01,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1795480.0, ans=0.125 2024-08-12 19:45:22,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5650, loss[loss=0.09113, beats_loss=0.01244, ecapa_loss=0.0001744, whisper_loss=0.07695, over 16418.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0108, ecapa_loss=0.0001732, whisper_loss=0.09279, over 3903473.77 frames. ], batch size: 69, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:45:35,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1795580.0, ans=0.125 2024-08-12 19:45:36,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.39 vs. limit=10.0 2024-08-12 19:45:38,058 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 19:45:48,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1795680.0, ans=0.125 2024-08-12 19:45:50,083 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 19:45:56,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1795680.0, ans=0.125 2024-08-12 19:46:00,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1795780.0, ans=0.0 2024-08-12 19:46:04,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1795780.0, ans=0.0 2024-08-12 19:46:25,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.70 vs. limit=15.0 2024-08-12 19:46:27,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-12 19:46:42,568 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.016e+01 2024-08-12 19:46:46,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-08-12 19:46:51,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2024-08-12 19:46:53,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1795980.0, ans=0.125 2024-08-12 19:46:56,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5700, loss[loss=0.111, beats_loss=0.01242, ecapa_loss=0.000155, whisper_loss=0.09699, over 19846.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01083, ecapa_loss=0.0001729, whisper_loss=0.09327, over 3886776.96 frames. ], batch size: 80, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:46:59,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.86 vs. limit=5.0 2024-08-12 19:47:03,069 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 20 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-12 19:47:04,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1796080.0, ans=0.0 2024-08-12 19:47:20,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1796180.0, ans=0.125 2024-08-12 19:47:25,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1796180.0, ans=0.0 2024-08-12 19:47:33,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.533e+01 2.876e+01 3.216e+01 4.377e+01, threshold=5.753e+01, percent-clipped=0.0 2024-08-12 19:47:46,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1796280.0, ans=0.0 2024-08-12 19:48:00,150 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-12 19:48:08,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1796380.0, ans=0.125 2024-08-12 19:48:15,318 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-12 19:48:17,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1796480.0, ans=0.1 2024-08-12 19:48:27,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1796480.0, ans=0.2 2024-08-12 19:48:30,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5750, loss[loss=0.1114, beats_loss=0.01033, ecapa_loss=0.0001424, whisper_loss=0.0996, over 17868.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01088, ecapa_loss=0.0001738, whisper_loss=0.09267, over 3884832.82 frames. ], batch size: 66, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:48:31,238 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 19:48:34,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-12 19:48:47,993 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 19:48:48,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1796680.0, ans=0.125 2024-08-12 19:50:01,501 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5800, loss[loss=0.08284, beats_loss=0.01346, ecapa_loss=0.0001327, whisper_loss=0.06805, over 18393.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01092, ecapa_loss=0.0001732, whisper_loss=0.09203, over 3878592.10 frames. ], batch size: 71, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:50:07,489 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 19:50:09,933 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 19:50:14,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1797180.0, ans=0.125 2024-08-12 19:50:20,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1797180.0, ans=0.0 2024-08-12 19:50:22,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1797180.0, ans=0.125 2024-08-12 19:50:27,800 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.440e+01 2.724e+01 3.167e+01 6.575e+01, threshold=5.447e+01, percent-clipped=2.0 2024-08-12 19:50:44,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=1797380.0, ans=15.0 2024-08-12 19:50:51,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-08-12 19:51:10,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1797480.0, ans=0.1 2024-08-12 19:51:14,169 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5850, loss[loss=0.09905, beats_loss=0.008495, ecapa_loss=0.0002639, whisper_loss=0.08791, over 13524.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01093, ecapa_loss=0.0001742, whisper_loss=0.09216, over 3882475.27 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:51:23,649 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 19:51:23,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1797580.0, ans=0.125 2024-08-12 19:51:24,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=12.0 2024-08-12 19:51:30,682 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-12 19:51:47,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1797780.0, ans=0.125 2024-08-12 19:51:56,670 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.244e+00 2024-08-12 19:51:59,296 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 19:52:17,909 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 19:52:18,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1797980.0, ans=0.125 2024-08-12 19:52:20,964 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 19:52:26,316 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5900, loss[loss=0.08066, beats_loss=0.01191, ecapa_loss=0.0001618, whisper_loss=0.06713, over 15395.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01094, ecapa_loss=0.0001725, whisper_loss=0.09238, over 3883392.34 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:52:35,333 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 19:52:48,097 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-12 19:52:54,637 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.654e+01 2.967e+01 3.336e+01 4.788e+01, threshold=5.934e+01, percent-clipped=0.0 2024-08-12 19:52:56,210 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 19:53:11,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-12 19:53:15,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1798380.0, ans=0.0 2024-08-12 19:53:15,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2024-08-12 19:53:23,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1798480.0, ans=0.0 2024-08-12 19:53:27,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2024-08-12 19:53:29,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1798480.0, ans=0.125 2024-08-12 19:53:38,614 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 5950, loss[loss=0.1012, beats_loss=0.01068, ecapa_loss=0.0001554, whisper_loss=0.08897, over 20885.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01097, ecapa_loss=0.0001719, whisper_loss=0.09195, over 3864841.90 frames. ], batch size: 81, lr: 4.86e-03, grad_scale: 1.152921504606847e+18 2024-08-12 19:53:51,373 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-12 19:53:58,313 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-12 19:54:23,584 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-12 19:54:26,280 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 19:54:30,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2024-08-12 19:54:32,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1798880.0, ans=0.0 2024-08-12 19:54:34,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1798880.0, ans=0.2 2024-08-12 19:54:34,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.34 vs. limit=15.0 2024-08-12 19:54:36,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1798880.0, ans=0.1 2024-08-12 19:54:43,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1798980.0, ans=0.0 2024-08-12 19:54:45,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1798980.0, ans=0.0 2024-08-12 19:54:49,377 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 19:54:53,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2024-08-12 19:54:55,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6000, loss[loss=0.1242, beats_loss=0.009506, ecapa_loss=0.0002063, whisper_loss=0.1127, over 22336.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.011, ecapa_loss=0.0001723, whisper_loss=0.09196, over 3884746.60 frames. ], batch size: 91, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:54:55,081 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 19:55:33,563 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005899, whisper_loss=0.2486, over 922467.00 frames. 2024-08-12 19:55:50,116 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on SV_voxceleb1: loss=0.004696, beats_loss=0, ecapa_loss=0.0004696, whisper_loss=0, over 939242.00 frames. 2024-08-12 19:57:46,514 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on AT_audioset: loss=0.02428, beats_loss=0.02428, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 19:57:46,519 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 19:57:52,146 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 19:57:55,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1799080.0, ans=0.0 2024-08-12 19:57:57,732 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 19:58:04,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-08-12 19:58:07,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1799180.0, ans=0.05 2024-08-12 19:58:16,058 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.501e+01 2.791e+01 3.141e+01 5.827e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-12 19:58:16,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1799280.0, ans=0.125 2024-08-12 19:58:25,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1799280.0, ans=0.04949747468305833 2024-08-12 19:58:33,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1799380.0, ans=0.125 2024-08-12 19:58:38,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1799380.0, ans=0.0 2024-08-12 19:58:54,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1799480.0, ans=0.5 2024-08-12 19:59:04,322 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6050, loss[loss=0.09913, beats_loss=0.01317, ecapa_loss=0.0001747, whisper_loss=0.08421, over 22465.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001713, whisper_loss=0.092, over 3877948.32 frames. ], batch size: 95, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 19:59:08,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1799580.0, ans=0.125 2024-08-12 19:59:11,883 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-12 19:59:22,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1799680.0, ans=0.125 2024-08-12 19:59:30,692 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 19:59:30,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1799680.0, ans=0.025 2024-08-12 19:59:31,970 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-12 19:59:43,337 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 19:59:55,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1799880.0, ans=0.0 2024-08-12 20:00:02,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1799880.0, ans=0.125 2024-08-12 20:00:09,862 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-180000.pt 2024-08-12 20:00:13,959 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 20:00:24,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6100, loss[loss=0.0907, beats_loss=0.01246, ecapa_loss=0.0001447, whisper_loss=0.07679, over 19832.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01099, ecapa_loss=0.000171, whisper_loss=0.09156, over 3863140.75 frames. ], batch size: 79, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:00:31,910 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.930e+01 2024-08-12 20:00:55,120 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.407e+01 2.685e+01 3.141e+01 4.380e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-12 20:01:14,248 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 20:01:23,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1800380.0, ans=0.125 2024-08-12 20:01:25,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-12 20:01:26,161 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-12 20:01:40,606 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 20:01:42,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6150, loss[loss=0.09907, beats_loss=0.009825, ecapa_loss=0.0002007, whisper_loss=0.08724, over 18914.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01101, ecapa_loss=0.0001706, whisper_loss=0.09166, over 3869243.44 frames. ], batch size: 78, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:01:46,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1800580.0, ans=0.125 2024-08-12 20:01:53,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1800580.0, ans=0.07 2024-08-12 20:02:09,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1800680.0, ans=0.1 2024-08-12 20:02:29,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2024-08-12 20:02:49,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1800980.0, ans=0.1 2024-08-12 20:02:49,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1800980.0, ans=0.0 2024-08-12 20:02:50,723 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 20:02:58,078 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6200, loss[loss=0.1155, beats_loss=0.00989, ecapa_loss=0.0001489, whisper_loss=0.1042, over 23408.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01111, ecapa_loss=0.0001703, whisper_loss=0.09078, over 3857267.90 frames. ], batch size: 89, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:03:00,075 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-12 20:03:19,940 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-12 20:03:24,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-08-12 20:03:27,801 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.462e+01 2.878e+01 3.273e+01 2.094e+02, threshold=5.757e+01, percent-clipped=3.0 2024-08-12 20:03:46,423 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-12 20:03:47,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-08-12 20:03:51,354 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 20:03:51,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1801380.0, ans=0.125 2024-08-12 20:04:05,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1801480.0, ans=0.0 2024-08-12 20:04:06,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.95 vs. limit=10.0 2024-08-12 20:04:08,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1801480.0, ans=0.2 2024-08-12 20:04:13,892 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6250, loss[loss=0.09826, beats_loss=0.008893, ecapa_loss=0.0001993, whisper_loss=0.08738, over 15987.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01103, ecapa_loss=0.0001719, whisper_loss=0.09095, over 3868744.85 frames. ], batch size: 67, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:04:15,429 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 40 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 20:04:16,842 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 20:04:59,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.01 vs. limit=22.5 2024-08-12 20:05:25,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1801980.0, ans=0.125 2024-08-12 20:05:28,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6300, loss[loss=0.1203, beats_loss=0.009867, ecapa_loss=0.0001775, whisper_loss=0.1087, over 22851.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01098, ecapa_loss=0.0001718, whisper_loss=0.09156, over 3864586.82 frames. ], batch size: 90, lr: 4.86e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:05:28,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1802080.0, ans=0.125 2024-08-12 20:05:57,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.436e+01 2.696e+01 3.138e+01 5.310e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-12 20:06:12,846 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 20:06:14,800 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 20:06:17,407 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 20:06:21,361 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 20:06:24,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1802380.0, ans=0.025 2024-08-12 20:06:25,588 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-12 20:06:28,776 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 20:06:36,866 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 20:06:38,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-08-12 20:06:43,443 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6350, loss[loss=0.108, beats_loss=0.01042, ecapa_loss=0.0001473, whisper_loss=0.09614, over 16717.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01098, ecapa_loss=0.0001728, whisper_loss=0.09136, over 3842932.38 frames. ], batch size: 63, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:06:58,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1802680.0, ans=10.0 2024-08-12 20:07:00,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-12 20:07:04,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2024-08-12 20:07:16,060 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-12 20:07:17,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1802780.0, ans=0.0 2024-08-12 20:07:20,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1802780.0, ans=0.125 2024-08-12 20:07:40,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.56 vs. limit=22.5 2024-08-12 20:07:41,214 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 20:07:50,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1802980.0, ans=0.125 2024-08-12 20:07:51,729 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 20:07:56,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1803080.0, ans=0.0 2024-08-12 20:07:57,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6400, loss[loss=0.1033, beats_loss=0.01248, ecapa_loss=0.0001349, whisper_loss=0.08946, over 22975.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01106, ecapa_loss=0.000171, whisper_loss=0.09143, over 3849088.27 frames. ], batch size: 90, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:07:58,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1803080.0, ans=0.2 2024-08-12 20:08:01,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2024-08-12 20:08:12,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2024-08-12 20:08:13,602 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 20:08:13,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1803180.0, ans=0.125 2024-08-12 20:08:24,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.558e+01 2.846e+01 3.413e+01 1.173e+02, threshold=5.692e+01, percent-clipped=2.0 2024-08-12 20:08:25,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1803280.0, ans=0.1 2024-08-12 20:08:27,893 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 20:08:30,513 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 32 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-12 20:08:32,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1803280.0, ans=0.125 2024-08-12 20:08:38,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1803380.0, ans=0.125 2024-08-12 20:08:45,676 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 20:09:08,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6450, loss[loss=0.1073, beats_loss=0.009867, ecapa_loss=0.0001605, whisper_loss=0.0958, over 17780.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01106, ecapa_loss=0.0001701, whisper_loss=0.09223, over 3916737.44 frames. ], batch size: 70, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:09:13,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-12 20:09:22,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1803680.0, ans=0.125 2024-08-12 20:09:24,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-08-12 20:09:41,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1803780.0, ans=0.0 2024-08-12 20:09:59,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.08 vs. limit=22.5 2024-08-12 20:10:01,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1803880.0, ans=0.2 2024-08-12 20:10:11,129 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 20:10:16,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=12.0 2024-08-12 20:10:20,051 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6500, loss[loss=0.09179, beats_loss=0.01055, ecapa_loss=0.0001598, whisper_loss=0.07964, over 21777.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01092, ecapa_loss=0.0001712, whisper_loss=0.09222, over 3926043.75 frames. ], batch size: 88, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:10:23,172 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-12 20:10:23,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1804080.0, ans=0.2 2024-08-12 20:10:39,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1804180.0, ans=0.025 2024-08-12 20:10:40,607 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 20:10:46,196 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 20:10:48,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.417e+01 2.617e+01 2.819e+01 4.970e+01, threshold=5.233e+01, percent-clipped=0.0 2024-08-12 20:11:11,487 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 20:11:11,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1804380.0, ans=0.125 2024-08-12 20:11:16,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1804480.0, ans=0.125 2024-08-12 20:11:16,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1804480.0, ans=0.1 2024-08-12 20:11:30,764 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6550, loss[loss=0.102, beats_loss=0.01001, ecapa_loss=0.0002265, whisper_loss=0.08968, over 22018.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01084, ecapa_loss=0.0001722, whisper_loss=0.09271, over 3920964.41 frames. ], batch size: 92, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:11:35,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1804580.0, ans=0.0 2024-08-12 20:11:47,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1804680.0, ans=0.0 2024-08-12 20:11:54,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1804680.0, ans=0.09899494936611666 2024-08-12 20:11:58,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.63 vs. limit=10.0 2024-08-12 20:11:58,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1804780.0, ans=0.0 2024-08-12 20:12:02,389 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-12 20:12:02,614 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:12:06,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1804780.0, ans=0.0 2024-08-12 20:12:12,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2024-08-12 20:12:16,635 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 20:12:22,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-12 20:12:26,006 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 20:12:32,735 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 20:12:39,726 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6600, loss[loss=0.1004, beats_loss=0.009849, ecapa_loss=0.0001682, whisper_loss=0.08888, over 22396.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01086, ecapa_loss=0.0001734, whisper_loss=0.09267, over 3907933.58 frames. ], batch size: 93, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:12:42,554 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 20:12:49,307 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 20:13:04,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1805180.0, ans=0.015 2024-08-12 20:13:06,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.480e+01 2.766e+01 3.110e+01 5.063e+01, threshold=5.533e+01, percent-clipped=0.0 2024-08-12 20:13:25,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1805380.0, ans=0.125 2024-08-12 20:13:38,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1805480.0, ans=0.125 2024-08-12 20:13:45,184 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 13 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-12 20:13:46,649 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 20:13:47,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6650, loss[loss=0.08852, beats_loss=0.01041, ecapa_loss=0.0001655, whisper_loss=0.07646, over 18300.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001741, whisper_loss=0.09146, over 3909493.49 frames. ], batch size: 75, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:13:53,253 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 20:14:29,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1805880.0, ans=0.125 2024-08-12 20:14:30,704 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 20:14:56,304 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6700, loss[loss=0.1196, beats_loss=0.008707, ecapa_loss=0.0001799, whisper_loss=0.1091, over 22651.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01097, ecapa_loss=0.0001745, whisper_loss=0.09144, over 3919447.10 frames. ], batch size: 87, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:15:04,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.81 vs. limit=15.0 2024-08-12 20:15:06,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1806080.0, ans=0.05 2024-08-12 20:15:13,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1806180.0, ans=0.0 2024-08-12 20:15:15,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1806180.0, ans=0.2 2024-08-12 20:15:23,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.568e+01 2.820e+01 3.306e+01 6.884e+01, threshold=5.641e+01, percent-clipped=3.0 2024-08-12 20:15:23,762 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 16 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-12 20:15:26,591 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 20:15:39,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1806380.0, ans=0.125 2024-08-12 20:15:45,100 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 39 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-12 20:15:50,403 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-12 20:15:53,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1806480.0, ans=10.0 2024-08-12 20:15:53,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1806480.0, ans=0.0 2024-08-12 20:15:56,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1806480.0, ans=0.1 2024-08-12 20:16:02,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-12 20:16:05,713 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6750, loss[loss=0.1139, beats_loss=0.01122, ecapa_loss=0.0001527, whisper_loss=0.1012, over 18184.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01085, ecapa_loss=0.0001735, whisper_loss=0.09221, over 3919231.38 frames. ], batch size: 74, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:16:20,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1806680.0, ans=0.125 2024-08-12 20:16:40,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-12 20:16:48,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1806880.0, ans=0.0 2024-08-12 20:16:53,696 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 20:16:59,521 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-12 20:17:15,487 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6800, loss[loss=0.0848, beats_loss=0.01138, ecapa_loss=0.0001486, whisper_loss=0.07193, over 19488.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01091, ecapa_loss=0.0001724, whisper_loss=0.09147, over 3902382.03 frames. ], batch size: 77, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:17:30,975 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-12 20:17:43,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.433e+01 2.678e+01 3.224e+01 5.136e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-12 20:17:49,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=12.0 2024-08-12 20:17:51,727 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-12 20:17:58,349 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-12 20:18:11,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1807480.0, ans=0.2 2024-08-12 20:18:16,382 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 20:18:24,714 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6850, loss[loss=0.1197, beats_loss=0.009461, ecapa_loss=0.000167, whisper_loss=0.1085, over 19459.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001741, whisper_loss=0.09114, over 3870039.34 frames. ], batch size: 74, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:18:25,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1807580.0, ans=0.1 2024-08-12 20:18:40,887 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 12 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 20:18:49,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1807680.0, ans=0.125 2024-08-12 20:18:49,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1807680.0, ans=0.125 2024-08-12 20:18:54,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1807780.0, ans=0.2 2024-08-12 20:19:01,400 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 36 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-12 20:19:09,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1807880.0, ans=0.0 2024-08-12 20:19:27,346 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 20:19:33,783 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6900, loss[loss=0.0865, beats_loss=0.01205, ecapa_loss=0.0001604, whisper_loss=0.07285, over 17032.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001749, whisper_loss=0.09114, over 3869989.25 frames. ], batch size: 68, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:19:34,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1808080.0, ans=0.125 2024-08-12 20:19:37,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1808080.0, ans=0.125 2024-08-12 20:19:51,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1808180.0, ans=0.0 2024-08-12 20:20:01,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.402e+01 2.709e+01 3.139e+01 1.091e+02, threshold=5.419e+01, percent-clipped=1.0 2024-08-12 20:20:05,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-08-12 20:20:27,206 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 20:20:27,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1808480.0, ans=0.125 2024-08-12 20:20:41,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 6950, loss[loss=0.1093, beats_loss=0.01313, ecapa_loss=0.0001541, whisper_loss=0.09463, over 23465.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01088, ecapa_loss=0.0001747, whisper_loss=0.09186, over 3871280.60 frames. ], batch size: 91, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:20:50,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1808580.0, ans=0.0 2024-08-12 20:21:05,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1808680.0, ans=0.125 2024-08-12 20:21:11,680 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 16 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 20:21:16,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1808780.0, ans=0.125 2024-08-12 20:21:26,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2024-08-12 20:21:34,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1808880.0, ans=0.125 2024-08-12 20:21:42,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1808980.0, ans=0.2 2024-08-12 20:21:42,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1808980.0, ans=0.125 2024-08-12 20:21:52,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7000, loss[loss=0.07561, beats_loss=0.01381, ecapa_loss=0.0001453, whisper_loss=0.06035, over 15125.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001741, whisper_loss=0.09169, over 3843568.24 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:22:18,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-12 20:22:19,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.381e+01 2.667e+01 3.091e+01 4.298e+01, threshold=5.335e+01, percent-clipped=0.0 2024-08-12 20:22:23,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1809280.0, ans=0.05 2024-08-12 20:22:29,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1809280.0, ans=0.2 2024-08-12 20:22:35,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1809380.0, ans=0.125 2024-08-12 20:22:58,861 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 17 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-12 20:23:01,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7050, loss[loss=0.1009, beats_loss=0.01086, ecapa_loss=0.0001689, whisper_loss=0.08835, over 19528.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001748, whisper_loss=0.09137, over 3870729.74 frames. ], batch size: 80, lr: 4.85e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:23:03,020 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 20:23:14,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-12 20:23:26,229 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-12 20:23:51,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1809880.0, ans=0.1 2024-08-12 20:23:51,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-12 20:23:53,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2024-08-12 20:24:01,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1809980.0, ans=0.125 2024-08-12 20:24:10,822 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7100, loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0002047, whisper_loss=0.09091, over 18711.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01088, ecapa_loss=0.0001742, whisper_loss=0.09231, over 3887041.08 frames. ], batch size: 81, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:24:19,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1810080.0, ans=0.125 2024-08-12 20:24:20,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.0 2024-08-12 20:24:23,827 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-12 20:24:25,115 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-12 20:24:29,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1810180.0, ans=0.0 2024-08-12 20:24:33,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1810180.0, ans=0.125 2024-08-12 20:24:38,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.554e+01 2.752e+01 3.133e+01 4.741e+01, threshold=5.504e+01, percent-clipped=0.0 2024-08-12 20:24:59,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1810380.0, ans=0.0 2024-08-12 20:25:11,323 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 20:25:19,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7150, loss[loss=0.09688, beats_loss=0.01165, ecapa_loss=0.0002212, whisper_loss=0.08302, over 20193.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01086, ecapa_loss=0.0001742, whisper_loss=0.09243, over 3868700.04 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:25:25,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=22.5 2024-08-12 20:25:49,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.93 vs. limit=22.5 2024-08-12 20:25:54,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1810780.0, ans=0.0 2024-08-12 20:26:00,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=12.0 2024-08-12 20:26:05,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1810880.0, ans=0.1 2024-08-12 20:26:15,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1810980.0, ans=0.125 2024-08-12 20:26:15,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1810980.0, ans=0.125 2024-08-12 20:26:15,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1810980.0, ans=0.125 2024-08-12 20:26:28,800 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7200, loss[loss=0.1181, beats_loss=0.01186, ecapa_loss=0.0001569, whisper_loss=0.1047, over 23956.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.0109, ecapa_loss=0.0001732, whisper_loss=0.09212, over 3877966.06 frames. ], batch size: 93, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:26:30,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1811080.0, ans=0.0 2024-08-12 20:26:33,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1811080.0, ans=0.125 2024-08-12 20:26:34,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1811080.0, ans=0.1 2024-08-12 20:26:45,231 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 20:26:53,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2024-08-12 20:26:55,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.470e+01 2.758e+01 3.060e+01 4.587e+01, threshold=5.516e+01, percent-clipped=0.0 2024-08-12 20:27:22,197 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 20:27:26,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2024-08-12 20:27:27,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1811480.0, ans=0.2 2024-08-12 20:27:30,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1811480.0, ans=0.1 2024-08-12 20:27:37,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7250, loss[loss=0.08009, beats_loss=0.007735, ecapa_loss=0.0002382, whisper_loss=0.06997, over 13130.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01082, ecapa_loss=0.0001737, whisper_loss=0.09233, over 3897992.86 frames. ], batch size: 53, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:27:37,315 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-12 20:27:46,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1811580.0, ans=0.1 2024-08-12 20:28:00,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1811680.0, ans=0.0 2024-08-12 20:28:02,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1811680.0, ans=0.0 2024-08-12 20:28:07,456 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 20:28:07,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1811780.0, ans=0.0 2024-08-12 20:28:12,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1811780.0, ans=0.0 2024-08-12 20:28:24,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-08-12 20:28:25,227 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 24 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-12 20:28:25,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-08-12 20:28:27,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1811880.0, ans=0.125 2024-08-12 20:28:35,258 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 20:28:47,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7300, loss[loss=0.1022, beats_loss=0.01095, ecapa_loss=0.0002274, whisper_loss=0.08896, over 20862.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01083, ecapa_loss=0.000174, whisper_loss=0.09254, over 3868733.14 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:29:14,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.458e+01 2.787e+01 3.037e+01 3.790e+01, threshold=5.575e+01, percent-clipped=0.0 2024-08-12 20:29:26,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1812280.0, ans=0.0 2024-08-12 20:29:32,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1812380.0, ans=0.0 2024-08-12 20:29:41,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1812480.0, ans=0.0 2024-08-12 20:29:47,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1812480.0, ans=0.04949747468305833 2024-08-12 20:29:56,528 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7350, loss[loss=0.1119, beats_loss=0.008779, ecapa_loss=0.0001947, whisper_loss=0.1012, over 16285.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001746, whisper_loss=0.09159, over 3832886.80 frames. ], batch size: 67, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:30:03,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1812580.0, ans=0.125 2024-08-12 20:30:03,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1812580.0, ans=0.1 2024-08-12 20:30:08,746 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 20:30:14,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-08-12 20:30:19,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1812680.0, ans=0.125 2024-08-12 20:30:36,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1812880.0, ans=0.5 2024-08-12 20:30:45,266 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 20:30:53,921 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 20:31:03,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1813080.0, ans=0.125 2024-08-12 20:31:04,883 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7400, loss[loss=0.1018, beats_loss=0.01064, ecapa_loss=0.0001757, whisper_loss=0.08944, over 14132.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001756, whisper_loss=0.0917, over 3839225.89 frames. ], batch size: 55, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:31:06,828 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:31:09,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1813080.0, ans=0.0 2024-08-12 20:31:13,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=12.0 2024-08-12 20:31:17,638 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-12 20:31:20,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1813180.0, ans=0.125 2024-08-12 20:31:22,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1813180.0, ans=0.0 2024-08-12 20:31:32,382 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.493e+01 2.726e+01 3.079e+01 4.243e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 20:31:50,542 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-12 20:31:57,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1813380.0, ans=0.125 2024-08-12 20:32:00,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1813480.0, ans=0.125 2024-08-12 20:32:08,101 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-12 20:32:13,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7450, loss[loss=0.1111, beats_loss=0.01026, ecapa_loss=0.0001684, whisper_loss=0.09914, over 16653.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01088, ecapa_loss=0.0001754, whisper_loss=0.09225, over 3853039.69 frames. ], batch size: 63, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:32:42,416 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-12 20:32:42,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1813780.0, ans=0.125 2024-08-12 20:32:45,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-08-12 20:32:46,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-12 20:32:50,455 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 20:33:10,619 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-12 20:33:21,745 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7500, loss[loss=0.09228, beats_loss=0.01166, ecapa_loss=0.0001215, whisper_loss=0.07941, over 14696.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001735, whisper_loss=0.09165, over 3840788.60 frames. ], batch size: 53, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:33:24,786 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 20:33:28,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1814080.0, ans=0.1 2024-08-12 20:33:29,082 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:33:37,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-08-12 20:33:47,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1814180.0, ans=0.125 2024-08-12 20:33:49,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.399e+01 2.676e+01 3.018e+01 5.657e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-12 20:33:56,224 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 20:34:03,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1814380.0, ans=0.0 2024-08-12 20:34:03,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=1814380.0, ans=12.0 2024-08-12 20:34:13,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1814380.0, ans=0.125 2024-08-12 20:34:31,176 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7550, loss[loss=0.1074, beats_loss=0.01309, ecapa_loss=0.0001691, whisper_loss=0.0926, over 22106.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01086, ecapa_loss=0.0001736, whisper_loss=0.09249, over 3855023.53 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:34:31,350 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 21 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 20:34:36,825 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-12 20:34:55,936 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 20:35:14,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-12 20:35:15,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1814880.0, ans=0.125 2024-08-12 20:35:16,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2024-08-12 20:35:25,645 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 20:35:29,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1814980.0, ans=10.0 2024-08-12 20:35:36,942 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 11 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 20:35:40,637 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7600, loss[loss=0.09071, beats_loss=0.00888, ecapa_loss=0.0001823, whisper_loss=0.08, over 18611.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.0001734, whisper_loss=0.09147, over 3878735.65 frames. ], batch size: 68, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:35:43,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1815080.0, ans=0.0 2024-08-12 20:36:04,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1815180.0, ans=0.0 2024-08-12 20:36:08,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.063e+01 2.568e+01 2.871e+01 3.338e+01 1.735e+02, threshold=5.742e+01, percent-clipped=2.0 2024-08-12 20:36:29,725 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-12 20:36:50,543 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7650, loss[loss=0.1291, beats_loss=0.00931, ecapa_loss=0.0001775, whisper_loss=0.118, over 22668.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01079, ecapa_loss=0.0001728, whisper_loss=0.09228, over 3878340.39 frames. ], batch size: 90, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:36:57,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1815580.0, ans=0.035 2024-08-12 20:36:57,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1815580.0, ans=0.125 2024-08-12 20:37:23,890 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 20:37:28,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1815780.0, ans=0.07 2024-08-12 20:37:29,508 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 20:37:29,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1815780.0, ans=0.07 2024-08-12 20:37:36,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.61 vs. limit=22.5 2024-08-12 20:37:46,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1815980.0, ans=0.125 2024-08-12 20:37:54,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1815980.0, ans=0.125 2024-08-12 20:37:59,928 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7700, loss[loss=0.1406, beats_loss=0.006757, ecapa_loss=0.0001913, whisper_loss=0.132, over 23149.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01082, ecapa_loss=0.0001735, whisper_loss=0.09203, over 3896512.71 frames. ], batch size: 88, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:38:07,007 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-12 20:38:24,916 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 40 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 20:38:27,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.538e+01 2.763e+01 3.264e+01 5.327e+01, threshold=5.526e+01, percent-clipped=0.0 2024-08-12 20:38:37,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1816280.0, ans=0.2 2024-08-12 20:38:40,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1816380.0, ans=0.0 2024-08-12 20:38:42,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1816380.0, ans=0.0 2024-08-12 20:38:50,181 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 20:38:54,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-08-12 20:38:58,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1816480.0, ans=10.0 2024-08-12 20:39:08,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1816580.0, ans=0.125 2024-08-12 20:39:08,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7750, loss[loss=0.09303, beats_loss=0.0126, ecapa_loss=0.0001551, whisper_loss=0.07888, over 19039.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001728, whisper_loss=0.092, over 3916640.64 frames. ], batch size: 76, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:39:13,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1816580.0, ans=0.125 2024-08-12 20:39:18,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1816580.0, ans=0.0 2024-08-12 20:39:20,449 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-12 20:39:26,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-12 20:39:30,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=1816680.0, ans=0.1 2024-08-12 20:39:33,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1816680.0, ans=0.0 2024-08-12 20:39:42,496 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 20:39:46,945 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 20:39:49,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1816880.0, ans=0.0 2024-08-12 20:39:59,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1816880.0, ans=0.125 2024-08-12 20:40:18,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7800, loss[loss=0.1141, beats_loss=0.01195, ecapa_loss=0.0001241, whisper_loss=0.1009, over 23063.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01085, ecapa_loss=0.0001715, whisper_loss=0.09255, over 3925323.10 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:40:36,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=1817180.0, ans=0.05 2024-08-12 20:40:42,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-08-12 20:40:45,815 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.560e+01 2.836e+01 3.091e+01 4.411e+01, threshold=5.671e+01, percent-clipped=0.0 2024-08-12 20:41:01,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1817380.0, ans=0.125 2024-08-12 20:41:06,610 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-12 20:41:09,414 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.447e-03 2024-08-12 20:41:12,048 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 20:41:12,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1817480.0, ans=0.2 2024-08-12 20:41:18,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1817480.0, ans=0.2 2024-08-12 20:41:20,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=22.5 2024-08-12 20:41:27,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7850, loss[loss=0.1147, beats_loss=0.01046, ecapa_loss=0.0001506, whisper_loss=0.1027, over 23107.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001714, whisper_loss=0.09262, over 3923303.30 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:41:37,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1817580.0, ans=0.0 2024-08-12 20:41:57,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1817780.0, ans=0.1 2024-08-12 20:41:58,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1817780.0, ans=0.125 2024-08-12 20:42:01,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1817780.0, ans=0.0 2024-08-12 20:42:36,558 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7900, loss[loss=0.09876, beats_loss=0.01134, ecapa_loss=0.0001841, whisper_loss=0.08558, over 17862.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.011, ecapa_loss=0.0001706, whisper_loss=0.09192, over 3881549.25 frames. ], batch size: 73, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:43:02,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-12 20:43:02,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2024-08-12 20:43:03,106 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 10 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-12 20:43:04,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.497e+01 2.722e+01 3.152e+01 4.641e+01, threshold=5.444e+01, percent-clipped=0.0 2024-08-12 20:43:04,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1818280.0, ans=0.1 2024-08-12 20:43:10,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1818280.0, ans=0.125 2024-08-12 20:43:23,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1818380.0, ans=0.125 2024-08-12 20:43:33,311 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-12 20:43:45,593 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 7950, loss[loss=0.07652, beats_loss=0.01466, ecapa_loss=0.0001325, whisper_loss=0.06054, over 16566.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01107, ecapa_loss=0.0001696, whisper_loss=0.09179, over 3878277.64 frames. ], batch size: 68, lr: 4.83e-03, grad_scale: 5.764607523034235e+17 2024-08-12 20:44:01,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2024-08-12 20:44:10,647 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-12 20:44:17,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1818780.0, ans=0.0 2024-08-12 20:44:19,883 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 13 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 20:44:20,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1818780.0, ans=0.125 2024-08-12 20:44:28,413 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 20:44:48,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1818980.0, ans=0.2 2024-08-12 20:44:54,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2024-08-12 20:44:55,009 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8000, loss[loss=0.09966, beats_loss=0.01378, ecapa_loss=0.0001449, whisper_loss=0.08443, over 13965.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.0001699, whisper_loss=0.09165, over 3870085.93 frames. ], batch size: 57, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:44:55,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1819080.0, ans=0.125 2024-08-12 20:45:03,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1819080.0, ans=0.2 2024-08-12 20:45:04,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1819080.0, ans=0.1 2024-08-12 20:45:05,911 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-12 20:45:13,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1819180.0, ans=0.125 2024-08-12 20:45:17,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1819180.0, ans=0.2 2024-08-12 20:45:22,512 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.456e+01 2.721e+01 3.092e+01 4.967e+01, threshold=5.442e+01, percent-clipped=0.0 2024-08-12 20:45:25,312 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 20:45:41,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1819380.0, ans=0.125 2024-08-12 20:46:04,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8050, loss[loss=0.08913, beats_loss=0.01114, ecapa_loss=0.0001648, whisper_loss=0.07634, over 13939.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001703, whisper_loss=0.09157, over 3844943.62 frames. ], batch size: 55, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:46:07,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1819580.0, ans=0.125 2024-08-12 20:46:15,635 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 20:46:37,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1819780.0, ans=0.125 2024-08-12 20:46:52,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1819880.0, ans=0.2 2024-08-12 20:46:55,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1819880.0, ans=0.1 2024-08-12 20:47:13,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8100, loss[loss=0.09725, beats_loss=0.01277, ecapa_loss=0.0001756, whisper_loss=0.08272, over 21808.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001719, whisper_loss=0.0919, over 3888830.38 frames. ], batch size: 90, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:47:31,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1820180.0, ans=0.1 2024-08-12 20:47:36,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1820180.0, ans=0.2 2024-08-12 20:47:40,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.501e+01 2.882e+01 3.230e+01 4.763e+01, threshold=5.764e+01, percent-clipped=0.0 2024-08-12 20:47:46,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1820280.0, ans=0.125 2024-08-12 20:48:03,049 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-12 20:48:22,285 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8150, loss[loss=0.09297, beats_loss=0.01496, ecapa_loss=0.000136, whisper_loss=0.07665, over 21293.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001716, whisper_loss=0.09163, over 3870267.24 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:48:28,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.31 vs. limit=10.0 2024-08-12 20:48:54,559 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-12 20:49:15,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1820880.0, ans=0.125 2024-08-12 20:49:18,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1820980.0, ans=0.1 2024-08-12 20:49:22,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1820980.0, ans=0.0 2024-08-12 20:49:23,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1820980.0, ans=0.125 2024-08-12 20:49:27,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1820980.0, ans=0.0 2024-08-12 20:49:31,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8200, loss[loss=0.06245, beats_loss=0.01454, ecapa_loss=0.0001141, whisper_loss=0.04677, over 18565.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01107, ecapa_loss=0.0001721, whisper_loss=0.09067, over 3915193.62 frames. ], batch size: 72, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:49:33,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1821080.0, ans=0.125 2024-08-12 20:49:40,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1821080.0, ans=0.05 2024-08-12 20:49:47,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1821180.0, ans=0.0 2024-08-12 20:49:57,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2024-08-12 20:49:59,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+01 2.516e+01 2.770e+01 3.136e+01 5.305e+01, threshold=5.540e+01, percent-clipped=0.0 2024-08-12 20:49:59,724 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-12 20:50:11,984 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-12 20:50:29,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1821480.0, ans=0.125 2024-08-12 20:50:34,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-08-12 20:50:38,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1821480.0, ans=0.2 2024-08-12 20:50:40,576 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8250, loss[loss=0.11, beats_loss=0.01149, ecapa_loss=0.0001381, whisper_loss=0.0971, over 23494.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01106, ecapa_loss=0.0001715, whisper_loss=0.091, over 3887485.67 frames. ], batch size: 93, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:50:52,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1821580.0, ans=0.0 2024-08-12 20:50:54,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=12.0 2024-08-12 20:50:55,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=12.0 2024-08-12 20:51:00,343 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 25 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 20:51:13,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1821780.0, ans=0.125 2024-08-12 20:51:15,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=1821780.0, ans=15.0 2024-08-12 20:51:16,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-12 20:51:28,426 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 20:51:50,232 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8300, loss[loss=0.09215, beats_loss=0.01038, ecapa_loss=0.000171, whisper_loss=0.08006, over 16393.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01105, ecapa_loss=0.0001698, whisper_loss=0.09094, over 3878564.05 frames. ], batch size: 64, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:51:50,722 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-12 20:51:51,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1822080.0, ans=0.2 2024-08-12 20:51:57,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-12 20:52:02,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-12 20:52:17,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.463e+01 2.692e+01 3.120e+01 9.968e+01, threshold=5.383e+01, percent-clipped=3.0 2024-08-12 20:52:22,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1822280.0, ans=0.015 2024-08-12 20:52:22,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1822280.0, ans=0.0 2024-08-12 20:52:23,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1822280.0, ans=0.125 2024-08-12 20:52:46,181 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-12 20:52:58,058 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8350, loss[loss=0.0911, beats_loss=0.009941, ecapa_loss=0.0002131, whisper_loss=0.07902, over 19207.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01105, ecapa_loss=0.0001702, whisper_loss=0.09078, over 3899128.12 frames. ], batch size: 81, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:53:02,266 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-12 20:53:05,045 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 20:53:05,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-12 20:53:06,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1822580.0, ans=0.0 2024-08-12 20:53:38,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1822880.0, ans=0.125 2024-08-12 20:54:07,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8400, loss[loss=0.09339, beats_loss=0.01186, ecapa_loss=0.0001533, whisper_loss=0.07999, over 20662.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01093, ecapa_loss=0.0001722, whisper_loss=0.09137, over 3914116.69 frames. ], batch size: 82, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:54:22,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-12 20:54:25,024 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-12 20:54:25,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1823180.0, ans=0.0 2024-08-12 20:54:29,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1823180.0, ans=0.2 2024-08-12 20:54:34,918 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-12 20:54:35,960 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.526e+01 2.875e+01 3.220e+01 4.758e+01, threshold=5.750e+01, percent-clipped=0.0 2024-08-12 20:54:37,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2024-08-12 20:54:44,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-12 20:54:49,474 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 20:54:51,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1823380.0, ans=0.0 2024-08-12 20:54:56,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1823380.0, ans=0.125 2024-08-12 20:54:58,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-12 20:55:13,686 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 31 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-12 20:55:18,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-12 20:55:18,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8450, loss[loss=0.103, beats_loss=0.009538, ecapa_loss=0.0001999, whisper_loss=0.09151, over 22806.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01087, ecapa_loss=0.0001724, whisper_loss=0.09132, over 3906898.59 frames. ], batch size: 91, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:55:36,484 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-12 20:55:37,924 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 20:55:40,505 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 20:55:57,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1823780.0, ans=0.0 2024-08-12 20:56:11,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-08-12 20:56:22,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2024-08-12 20:56:24,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.86 vs. limit=22.5 2024-08-12 20:56:27,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-12 20:56:31,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8500, loss[loss=0.08734, beats_loss=0.01155, ecapa_loss=0.0001474, whisper_loss=0.07431, over 18300.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001719, whisper_loss=0.09173, over 3929236.69 frames. ], batch size: 72, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:56:45,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2024-08-12 20:57:01,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.569e+01 2.792e+01 3.196e+01 4.300e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-12 20:57:06,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1824280.0, ans=0.0 2024-08-12 20:57:19,349 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-12 20:57:20,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1824380.0, ans=0.1 2024-08-12 20:57:38,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1824480.0, ans=0.125 2024-08-12 20:57:40,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-12 20:57:46,081 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8550, loss[loss=0.1076, beats_loss=0.009753, ecapa_loss=0.0001684, whisper_loss=0.09621, over 18153.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01089, ecapa_loss=0.0001709, whisper_loss=0.09226, over 3932576.51 frames. ], batch size: 69, lr: 4.83e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:57:46,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1824580.0, ans=0.125 2024-08-12 20:58:02,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1824680.0, ans=0.125 2024-08-12 20:58:50,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-12 20:58:53,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-12 20:58:58,972 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8600, loss[loss=0.1153, beats_loss=0.01047, ecapa_loss=0.0001634, whisper_loss=0.1031, over 19661.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001696, whisper_loss=0.09263, over 3903764.63 frames. ], batch size: 75, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 20:59:04,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1825080.0, ans=0.04949747468305833 2024-08-12 20:59:09,853 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-12 20:59:12,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1825080.0, ans=0.125 2024-08-12 20:59:17,623 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 20:59:29,864 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-12 20:59:31,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.497e+01 2.777e+01 3.095e+01 5.281e+01, threshold=5.554e+01, percent-clipped=0.0 2024-08-12 20:59:51,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=1825380.0, ans=15.0 2024-08-12 20:59:56,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1825380.0, ans=0.125 2024-08-12 21:00:06,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1825480.0, ans=0.2 2024-08-12 21:00:12,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1825480.0, ans=0.125 2024-08-12 21:00:13,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1825480.0, ans=0.125 2024-08-12 21:00:17,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8650, loss[loss=0.09043, beats_loss=0.0116, ecapa_loss=0.0001333, whisper_loss=0.07749, over 16583.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01096, ecapa_loss=0.0001697, whisper_loss=0.09218, over 3903144.78 frames. ], batch size: 62, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:00:27,128 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 21:00:48,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1825780.0, ans=0.0 2024-08-12 21:00:56,812 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 32 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 21:01:02,785 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-12 21:01:25,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1825980.0, ans=0.1 2024-08-12 21:01:30,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1825980.0, ans=0.0 2024-08-12 21:01:33,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8700, loss[loss=0.1032, beats_loss=0.01357, ecapa_loss=0.0001632, whisper_loss=0.08803, over 22023.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001704, whisper_loss=0.09186, over 3908596.65 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:01:42,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1826080.0, ans=0.125 2024-08-12 21:01:44,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1826080.0, ans=0.1 2024-08-12 21:01:48,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1826180.0, ans=0.125 2024-08-12 21:01:48,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1826180.0, ans=0.125 2024-08-12 21:02:04,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.617e+01 2.806e+01 3.109e+01 1.024e+02, threshold=5.612e+01, percent-clipped=1.0 2024-08-12 21:02:14,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1826280.0, ans=0.0 2024-08-12 21:02:15,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1826280.0, ans=0.0 2024-08-12 21:02:33,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1826480.0, ans=0.125 2024-08-12 21:02:41,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1826480.0, ans=0.2 2024-08-12 21:02:50,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8750, loss[loss=0.08442, beats_loss=0.01127, ecapa_loss=0.0002125, whisper_loss=0.07102, over 20806.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01097, ecapa_loss=0.00017, whisper_loss=0.0911, over 3860359.33 frames. ], batch size: 90, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:03:00,372 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 21:03:00,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1826580.0, ans=0.0 2024-08-12 21:03:21,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2024-08-12 21:03:29,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1826780.0, ans=0.125 2024-08-12 21:03:29,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 2024-08-12 21:03:37,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1826880.0, ans=0.0 2024-08-12 21:03:48,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1826880.0, ans=0.125 2024-08-12 21:03:49,228 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 21:03:59,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1826980.0, ans=0.04949747468305833 2024-08-12 21:04:03,628 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 21:04:08,027 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8800, loss[loss=0.08637, beats_loss=0.01009, ecapa_loss=0.0001573, whisper_loss=0.07471, over 13962.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001713, whisper_loss=0.09154, over 3878080.48 frames. ], batch size: 55, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:04:31,841 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:04:39,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.528e+01 2.804e+01 3.159e+01 1.036e+02, threshold=5.609e+01, percent-clipped=2.0 2024-08-12 21:04:45,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1827280.0, ans=10.0 2024-08-12 21:05:14,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1827480.0, ans=0.1 2024-08-12 21:05:26,721 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8850, loss[loss=0.1219, beats_loss=0.007332, ecapa_loss=0.0002161, whisper_loss=0.1124, over 14749.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001697, whisper_loss=0.09157, over 3882840.22 frames. ], batch size: 59, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:05:32,057 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-12 21:05:33,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1827580.0, ans=0.125 2024-08-12 21:05:52,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1827680.0, ans=0.2 2024-08-12 21:05:53,225 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-12 21:06:01,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1827780.0, ans=0.07 2024-08-12 21:06:09,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1827780.0, ans=0.125 2024-08-12 21:06:11,115 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-12 21:06:16,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=15.0 2024-08-12 21:06:23,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1827880.0, ans=0.1 2024-08-12 21:06:35,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.04 vs. limit=10.0 2024-08-12 21:06:42,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8900, loss[loss=0.1072, beats_loss=0.01118, ecapa_loss=0.000156, whisper_loss=0.09444, over 20486.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01103, ecapa_loss=0.000169, whisper_loss=0.09139, over 3874024.95 frames. ], batch size: 81, lr: 4.82e-03, grad_scale: 1.152921504606847e+18 2024-08-12 21:06:42,999 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 21:06:54,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1828080.0, ans=0.125 2024-08-12 21:07:15,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.512e+01 2.855e+01 3.103e+01 6.109e+01, threshold=5.710e+01, percent-clipped=1.0 2024-08-12 21:07:16,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=12.0 2024-08-12 21:07:19,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1828280.0, ans=0.2 2024-08-12 21:07:23,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1828280.0, ans=0.95 2024-08-12 21:07:26,310 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 21:07:59,890 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 8950, loss[loss=0.09295, beats_loss=0.0118, ecapa_loss=0.0001759, whisper_loss=0.07939, over 22337.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001699, whisper_loss=0.09182, over 3872731.59 frames. ], batch size: 92, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:08:00,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1828580.0, ans=0.1 2024-08-12 21:08:13,624 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 21:08:22,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.78 vs. limit=10.0 2024-08-12 21:08:32,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1828780.0, ans=0.2 2024-08-12 21:08:57,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1828880.0, ans=0.015 2024-08-12 21:09:06,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1828980.0, ans=0.2 2024-08-12 21:09:16,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9000, loss[loss=0.1154, beats_loss=0.009719, ecapa_loss=0.0001742, whisper_loss=0.104, over 16463.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001703, whisper_loss=0.09194, over 3856179.98 frames. ], batch size: 64, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:09:16,135 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 21:09:54,929 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005776, whisper_loss=0.2483, over 922467.00 frames. 2024-08-12 21:10:13,779 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on SV_voxceleb1: loss=0.004711, beats_loss=0, ecapa_loss=0.0004711, whisper_loss=0, over 939242.00 frames. 2024-08-12 21:12:02,746 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 21:12:02,750 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 21:12:04,421 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-12 21:12:27,830 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.228e-02 2024-08-12 21:12:31,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-12 21:12:37,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.414e+01 2.685e+01 3.059e+01 6.063e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-12 21:12:56,703 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 34 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-12 21:13:03,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1829380.0, ans=0.2 2024-08-12 21:13:08,656 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 33 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 21:13:22,657 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9050, loss[loss=0.1255, beats_loss=0.007525, ecapa_loss=0.0001942, whisper_loss=0.116, over 17649.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01089, ecapa_loss=0.0001701, whisper_loss=0.09224, over 3889765.69 frames. ], batch size: 69, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:13:33,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1829580.0, ans=0.0 2024-08-12 21:13:34,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1829580.0, ans=0.2 2024-08-12 21:13:37,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1829680.0, ans=0.125 2024-08-12 21:13:50,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1829680.0, ans=0.125 2024-08-12 21:13:58,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1829780.0, ans=0.04949747468305833 2024-08-12 21:13:58,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1829780.0, ans=0.1 2024-08-12 21:13:59,778 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 21:14:01,137 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 14 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-12 21:14:19,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1829880.0, ans=0.1 2024-08-12 21:14:30,249 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 21:14:33,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1829980.0, ans=0.09899494936611666 2024-08-12 21:14:35,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2024-08-12 21:14:38,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9100, loss[loss=0.1208, beats_loss=0.009157, ecapa_loss=0.0001875, whisper_loss=0.1098, over 22465.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01083, ecapa_loss=0.000172, whisper_loss=0.09263, over 3875728.67 frames. ], batch size: 88, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:14:39,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-08-12 21:14:39,983 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 21:14:45,031 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 21:14:50,627 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-12 21:14:52,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1830180.0, ans=0.125 2024-08-12 21:14:55,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1830180.0, ans=0.1 2024-08-12 21:15:11,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.477e+01 2.788e+01 3.055e+01 6.197e+01, threshold=5.576e+01, percent-clipped=1.0 2024-08-12 21:15:23,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1830280.0, ans=0.0 2024-08-12 21:15:25,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1830380.0, ans=0.0 2024-08-12 21:15:56,210 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9150, loss[loss=0.1053, beats_loss=0.01154, ecapa_loss=0.0001341, whisper_loss=0.09242, over 21959.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.0001713, whisper_loss=0.09189, over 3869991.05 frames. ], batch size: 83, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:16:05,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1830580.0, ans=0.0 2024-08-12 21:16:17,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1830680.0, ans=0.125 2024-08-12 21:16:25,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1830780.0, ans=0.125 2024-08-12 21:16:34,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1830780.0, ans=0.125 2024-08-12 21:16:54,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1830880.0, ans=0.1 2024-08-12 21:17:04,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-08-12 21:17:10,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9200, loss[loss=0.11, beats_loss=0.01019, ecapa_loss=0.0001762, whisper_loss=0.09809, over 16741.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001719, whisper_loss=0.09195, over 3874004.72 frames. ], batch size: 67, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:17:15,426 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-12 21:17:30,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1831180.0, ans=0.125 2024-08-12 21:17:37,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-12 21:17:42,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.481e+01 2.738e+01 3.160e+01 4.519e+01, threshold=5.476e+01, percent-clipped=0.0 2024-08-12 21:17:42,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1831280.0, ans=0.95 2024-08-12 21:17:48,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1831280.0, ans=0.2 2024-08-12 21:17:52,105 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 21:18:19,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1831480.0, ans=0.0 2024-08-12 21:18:26,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9250, loss[loss=0.1173, beats_loss=0.008284, ecapa_loss=0.000179, whisper_loss=0.1072, over 14816.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01091, ecapa_loss=0.0001723, whisper_loss=0.09182, over 3913903.51 frames. ], batch size: 58, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:19:29,942 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 21:19:30,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1831980.0, ans=0.125 2024-08-12 21:19:35,829 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 21:19:41,601 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9300, loss[loss=0.08209, beats_loss=0.01272, ecapa_loss=0.0001634, whisper_loss=0.06773, over 14953.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01093, ecapa_loss=0.0001712, whisper_loss=0.09205, over 3932335.78 frames. ], batch size: 59, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:19:45,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1832080.0, ans=0.0 2024-08-12 21:20:02,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1832180.0, ans=0.0 2024-08-12 21:20:11,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1832280.0, ans=0.1 2024-08-12 21:20:11,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.613e+01 2.997e+01 3.337e+01 4.853e+01, threshold=5.993e+01, percent-clipped=0.0 2024-08-12 21:20:12,135 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-12 21:20:26,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1832380.0, ans=0.1 2024-08-12 21:20:35,052 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 21:20:38,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1832480.0, ans=0.1 2024-08-12 21:20:53,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-12 21:20:54,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9350, loss[loss=0.1043, beats_loss=0.01089, ecapa_loss=0.0001862, whisper_loss=0.09152, over 22125.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01103, ecapa_loss=0.0001697, whisper_loss=0.09156, over 3911255.26 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:21:00,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1832580.0, ans=0.125 2024-08-12 21:21:03,631 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-12 21:21:13,503 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-12 21:21:23,453 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-12 21:21:25,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1832780.0, ans=0.1 2024-08-12 21:21:25,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-08-12 21:21:29,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1832780.0, ans=0.07 2024-08-12 21:21:55,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-12 21:21:56,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1832980.0, ans=0.0 2024-08-12 21:22:08,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9400, loss[loss=0.117, beats_loss=0.01034, ecapa_loss=0.0001682, whisper_loss=0.105, over 20921.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.000171, whisper_loss=0.0917, over 3892666.50 frames. ], batch size: 84, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:22:12,855 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 21:22:17,517 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-12 21:22:25,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1833180.0, ans=0.125 2024-08-12 21:22:40,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.361e+01 2.679e+01 2.977e+01 4.432e+01, threshold=5.358e+01, percent-clipped=0.0 2024-08-12 21:22:47,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1833280.0, ans=0.0 2024-08-12 21:22:58,824 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-12 21:23:21,539 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 21:23:21,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1833480.0, ans=0.125 2024-08-12 21:23:24,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9450, loss[loss=0.1179, beats_loss=0.01018, ecapa_loss=0.0001495, whisper_loss=0.1063, over 18858.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01102, ecapa_loss=0.0001703, whisper_loss=0.09122, over 3881249.45 frames. ], batch size: 71, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:23:28,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-12 21:23:34,044 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-12 21:23:34,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1833580.0, ans=0.125 2024-08-12 21:23:55,794 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 21:24:06,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1833780.0, ans=0.125 2024-08-12 21:24:23,822 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 21:24:25,199 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-12 21:24:25,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1833980.0, ans=0.125 2024-08-12 21:24:26,771 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 21:24:39,156 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9500, loss[loss=0.08938, beats_loss=0.01369, ecapa_loss=0.0001549, whisper_loss=0.07414, over 20674.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01096, ecapa_loss=0.0001713, whisper_loss=0.09184, over 3884992.06 frames. ], batch size: 84, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:24:41,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1834080.0, ans=0.2 2024-08-12 21:24:42,478 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-12 21:24:46,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1834080.0, ans=0.2 2024-08-12 21:25:07,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1834280.0, ans=0.0 2024-08-12 21:25:09,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.426e+01 2.699e+01 3.219e+01 5.763e+01, threshold=5.398e+01, percent-clipped=1.0 2024-08-12 21:25:20,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1834280.0, ans=0.1 2024-08-12 21:25:24,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1834380.0, ans=0.0 2024-08-12 21:25:25,021 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 21:25:26,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1834380.0, ans=0.0 2024-08-12 21:25:33,494 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-12 21:25:50,202 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9550, loss[loss=0.102, beats_loss=0.009445, ecapa_loss=0.0002048, whisper_loss=0.09054, over 20504.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01099, ecapa_loss=0.0001714, whisper_loss=0.09144, over 3869278.37 frames. ], batch size: 86, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:26:07,160 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 21:26:07,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1834680.0, ans=0.0 2024-08-12 21:26:22,299 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-12 21:26:25,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1834780.0, ans=0.2 2024-08-12 21:26:27,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1834780.0, ans=0.1 2024-08-12 21:26:29,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-12 21:26:32,680 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 21:26:41,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1834880.0, ans=0.125 2024-08-12 21:27:01,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9600, loss[loss=0.128, beats_loss=0.008248, ecapa_loss=0.0001738, whisper_loss=0.118, over 21855.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001719, whisper_loss=0.09197, over 3845495.43 frames. ], batch size: 81, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:27:27,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1835280.0, ans=0.125 2024-08-12 21:27:30,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.578e+01 2.916e+01 3.452e+01 6.223e+01, threshold=5.833e+01, percent-clipped=1.0 2024-08-12 21:27:45,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-08-12 21:27:51,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1835380.0, ans=0.0 2024-08-12 21:28:09,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1835580.0, ans=0.125 2024-08-12 21:28:10,166 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9650, loss[loss=0.1245, beats_loss=0.009972, ecapa_loss=0.0001892, whisper_loss=0.1126, over 21840.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01081, ecapa_loss=0.0001725, whisper_loss=0.09232, over 3822072.20 frames. ], batch size: 87, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:28:10,281 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-12 21:28:22,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1835580.0, ans=0.125 2024-08-12 21:28:33,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-12 21:28:46,447 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 21:29:12,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1835980.0, ans=0.04949747468305833 2024-08-12 21:29:19,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9700, loss[loss=0.08527, beats_loss=0.01144, ecapa_loss=0.00017, whisper_loss=0.07213, over 16219.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001741, whisper_loss=0.09137, over 3804451.61 frames. ], batch size: 65, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:29:42,032 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-12 21:29:48,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.03 vs. limit=6.0 2024-08-12 21:29:48,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.429e+01 2.686e+01 3.028e+01 5.758e+01, threshold=5.372e+01, percent-clipped=0.0 2024-08-12 21:30:00,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1836380.0, ans=0.125 2024-08-12 21:30:14,097 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-12 21:30:30,600 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9750, loss[loss=0.09723, beats_loss=0.007042, ecapa_loss=0.0001898, whisper_loss=0.08829, over 14129.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001731, whisper_loss=0.09114, over 3795678.48 frames. ], batch size: 53, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:30:31,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-12 21:30:31,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=12.0 2024-08-12 21:30:32,124 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 21:30:48,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1836680.0, ans=0.0 2024-08-12 21:30:53,166 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-12 21:31:09,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-12 21:31:15,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1836880.0, ans=0.125 2024-08-12 21:31:27,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1836980.0, ans=0.0 2024-08-12 21:31:27,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-12 21:31:33,850 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-12 21:31:37,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1836980.0, ans=0.2 2024-08-12 21:31:39,094 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-12 21:31:42,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9800, loss[loss=0.1009, beats_loss=0.008669, ecapa_loss=0.0001671, whisper_loss=0.09055, over 17442.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01094, ecapa_loss=0.0001727, whisper_loss=0.09098, over 3818278.80 frames. ], batch size: 67, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:31:47,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1837080.0, ans=0.1 2024-08-12 21:31:52,705 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 21:31:54,002 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 21:31:58,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1837180.0, ans=0.07 2024-08-12 21:31:59,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-12 21:32:12,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.082e+01 2.453e+01 2.781e+01 3.151e+01 8.550e+01, threshold=5.562e+01, percent-clipped=1.0 2024-08-12 21:32:14,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1837280.0, ans=0.09899494936611666 2024-08-12 21:32:25,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-08-12 21:32:29,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2024-08-12 21:32:44,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-12 21:32:54,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1837580.0, ans=0.2 2024-08-12 21:32:55,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9850, loss[loss=0.1067, beats_loss=0.00994, ecapa_loss=0.0001754, whisper_loss=0.09502, over 20774.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01097, ecapa_loss=0.0001723, whisper_loss=0.09152, over 3852152.09 frames. ], batch size: 84, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:32:55,854 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 25 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-12 21:33:14,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1837680.0, ans=0.2 2024-08-12 21:33:40,818 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 40 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 21:33:45,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1837880.0, ans=0.125 2024-08-12 21:33:50,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-12 21:33:52,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1837980.0, ans=0.125 2024-08-12 21:33:59,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1837980.0, ans=0.0 2024-08-12 21:34:06,975 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9900, loss[loss=0.1114, beats_loss=0.009819, ecapa_loss=0.0001927, whisper_loss=0.09963, over 21275.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.0001716, whisper_loss=0.09188, over 3884693.07 frames. ], batch size: 87, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:34:10,046 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-12 21:34:10,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1838080.0, ans=0.0 2024-08-12 21:34:27,259 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 21:34:31,558 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-12 21:34:36,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.567e+01 2.799e+01 3.140e+01 5.231e+01, threshold=5.598e+01, percent-clipped=0.0 2024-08-12 21:34:40,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1838280.0, ans=0.0 2024-08-12 21:34:46,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1838280.0, ans=0.5 2024-08-12 21:35:16,832 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-12 21:35:21,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 9950, loss[loss=0.1133, beats_loss=0.01032, ecapa_loss=0.0001813, whisper_loss=0.1012, over 18691.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01106, ecapa_loss=0.000171, whisper_loss=0.09206, over 3897038.54 frames. ], batch size: 73, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:35:46,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1838680.0, ans=0.125 2024-08-12 21:35:58,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1838780.0, ans=0.1 2024-08-12 21:36:06,174 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 21:36:16,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1838880.0, ans=0.0 2024-08-12 21:36:17,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1838880.0, ans=0.0 2024-08-12 21:36:24,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1838980.0, ans=0.1 2024-08-12 21:36:27,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1838980.0, ans=0.2 2024-08-12 21:36:36,569 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10000, loss[loss=0.1141, beats_loss=0.01264, ecapa_loss=0.000148, whisper_loss=0.1, over 17281.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01107, ecapa_loss=0.0001724, whisper_loss=0.09173, over 3891670.33 frames. ], batch size: 66, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:36:36,805 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-12 21:36:42,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1839080.0, ans=0.125 2024-08-12 21:36:49,351 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 10 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 21:36:51,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1839180.0, ans=0.125 2024-08-12 21:36:52,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-08-12 21:37:06,045 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.540e+01 2.812e+01 3.144e+01 2.734e+02, threshold=5.624e+01, percent-clipped=2.0 2024-08-12 21:37:12,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1839280.0, ans=0.1 2024-08-12 21:37:17,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1839280.0, ans=0.0 2024-08-12 21:37:29,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-08-12 21:37:44,660 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:37:46,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1839480.0, ans=0.125 2024-08-12 21:37:47,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1839580.0, ans=0.125 2024-08-12 21:37:48,446 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10050, loss[loss=0.1112, beats_loss=0.008649, ecapa_loss=0.000188, whisper_loss=0.1006, over 17296.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01102, ecapa_loss=0.0001718, whisper_loss=0.09159, over 3886548.46 frames. ], batch size: 70, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:37:48,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1839580.0, ans=0.0 2024-08-12 21:38:11,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1839680.0, ans=0.2 2024-08-12 21:38:12,315 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 27 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-12 21:38:21,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1839680.0, ans=0.0 2024-08-12 21:38:21,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1839680.0, ans=0.125 2024-08-12 21:38:25,750 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-12 21:38:27,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1839780.0, ans=0.125 2024-08-12 21:38:32,229 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-12 21:38:35,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-08-12 21:38:36,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1839780.0, ans=0.125 2024-08-12 21:38:45,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1839880.0, ans=0.125 2024-08-12 21:38:56,120 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 21:38:57,755 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-184000.pt 2024-08-12 21:39:10,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-12 21:39:11,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1840080.0, ans=0.2 2024-08-12 21:39:12,386 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10100, loss[loss=0.1044, beats_loss=0.01174, ecapa_loss=0.0001624, whisper_loss=0.091, over 22611.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01098, ecapa_loss=0.0001725, whisper_loss=0.09229, over 3905415.55 frames. ], batch size: 90, lr: 4.81e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:39:14,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1840080.0, ans=0.0 2024-08-12 21:39:45,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.534e+01 2.755e+01 3.172e+01 9.610e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-12 21:39:59,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1840380.0, ans=0.0 2024-08-12 21:40:09,295 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 21:40:34,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10150, loss[loss=0.1173, beats_loss=0.008317, ecapa_loss=0.0001739, whisper_loss=0.1072, over 18200.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01094, ecapa_loss=0.0001731, whisper_loss=0.09241, over 3925335.71 frames. ], batch size: 71, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:40:39,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1840580.0, ans=0.1 2024-08-12 21:40:56,084 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 21:40:56,424 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.231e+01 2024-08-12 21:40:58,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1840680.0, ans=0.125 2024-08-12 21:41:50,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1840980.0, ans=0.1 2024-08-12 21:41:57,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2024-08-12 21:42:00,683 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 21:42:08,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10200, loss[loss=0.09528, beats_loss=0.008522, ecapa_loss=0.0002139, whisper_loss=0.08461, over 19804.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.0001725, whisper_loss=0.09207, over 3911396.30 frames. ], batch size: 83, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:42:15,924 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 21:42:16,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1841080.0, ans=0.125 2024-08-12 21:42:22,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1841080.0, ans=0.125 2024-08-12 21:42:28,569 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 33 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 21:42:52,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-12 21:42:53,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+01 2.450e+01 2.670e+01 3.042e+01 4.548e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-12 21:43:02,626 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-12 21:43:04,704 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-12 21:43:13,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1841380.0, ans=0.0 2024-08-12 21:43:35,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-12 21:43:57,411 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10250, loss[loss=0.09441, beats_loss=0.01101, ecapa_loss=0.0001621, whisper_loss=0.08178, over 16573.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01084, ecapa_loss=0.0001724, whisper_loss=0.09189, over 3841149.50 frames. ], batch size: 64, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:44:04,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1841580.0, ans=0.0 2024-08-12 21:44:42,002 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-12 21:44:50,606 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 21:45:10,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1841880.0, ans=15.0 2024-08-12 21:45:16,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.54 vs. limit=22.5 2024-08-12 21:45:23,841 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-12 21:45:41,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1841980.0, ans=0.1 2024-08-12 21:45:45,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1842080.0, ans=0.125 2024-08-12 21:45:47,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10300, loss[loss=0.1206, beats_loss=0.00968, ecapa_loss=0.0001762, whisper_loss=0.1092, over 23409.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.0001723, whisper_loss=0.09203, over 3879749.72 frames. ], batch size: 92, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:45:47,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1842080.0, ans=0.125 2024-08-12 21:46:23,831 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 21:46:37,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.506e+01 2.751e+01 3.160e+01 4.441e+01, threshold=5.501e+01, percent-clipped=0.0 2024-08-12 21:46:38,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1842280.0, ans=0.1 2024-08-12 21:47:01,341 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-12 21:47:19,747 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-12 21:47:20,842 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 21:47:30,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1842480.0, ans=0.125 2024-08-12 21:47:30,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2024-08-12 21:47:33,636 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10350, loss[loss=0.09722, beats_loss=0.01324, ecapa_loss=0.000178, whisper_loss=0.0822, over 22326.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01094, ecapa_loss=0.0001714, whisper_loss=0.09194, over 3915485.58 frames. ], batch size: 92, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:47:35,050 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-12 21:47:45,936 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 21:48:06,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1842780.0, ans=0.125 2024-08-12 21:48:22,796 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 21:48:23,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1842880.0, ans=0.125 2024-08-12 21:48:25,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1842880.0, ans=0.125 2024-08-12 21:48:33,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1842980.0, ans=0.125 2024-08-12 21:48:38,536 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-12 21:48:43,390 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 21:48:45,874 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10400, loss[loss=0.1007, beats_loss=0.01287, ecapa_loss=0.0001716, whisper_loss=0.08616, over 22341.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.000171, whisper_loss=0.0916, over 3895611.13 frames. ], batch size: 94, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:49:16,965 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.438e+01 2.753e+01 3.076e+01 5.598e+01, threshold=5.505e+01, percent-clipped=1.0 2024-08-12 21:49:39,359 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-12 21:49:45,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1843480.0, ans=0.1 2024-08-12 21:49:48,189 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-12 21:49:55,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1843480.0, ans=0.0 2024-08-12 21:49:57,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1843480.0, ans=0.0 2024-08-12 21:49:58,876 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.485e+01 2024-08-12 21:49:59,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10450, loss[loss=0.08977, beats_loss=0.01199, ecapa_loss=0.0001402, whisper_loss=0.07638, over 19151.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01085, ecapa_loss=0.0001723, whisper_loss=0.09159, over 3887422.27 frames. ], batch size: 76, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:50:01,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1843580.0, ans=0.1 2024-08-12 21:50:05,916 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 29 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-12 21:50:18,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1843680.0, ans=0.125 2024-08-12 21:50:18,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1843680.0, ans=0.125 2024-08-12 21:50:38,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2024-08-12 21:50:53,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1843880.0, ans=0.125 2024-08-12 21:50:59,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1843980.0, ans=0.0 2024-08-12 21:51:09,097 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 21:51:12,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1843980.0, ans=0.125 2024-08-12 21:51:14,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10500, loss[loss=0.1092, beats_loss=0.0105, ecapa_loss=0.0001759, whisper_loss=0.09692, over 23222.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.000172, whisper_loss=0.09131, over 3864918.96 frames. ], batch size: 92, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:51:23,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1844080.0, ans=0.0 2024-08-12 21:51:41,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1844180.0, ans=0.0 2024-08-12 21:51:42,916 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 21:51:45,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.358e+01 2.688e+01 3.093e+01 1.105e+02, threshold=5.376e+01, percent-clipped=1.0 2024-08-12 21:51:58,022 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 21:52:02,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1844380.0, ans=0.125 2024-08-12 21:52:12,674 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-12 21:52:30,764 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10550, loss[loss=0.07939, beats_loss=0.01624, ecapa_loss=0.0001352, whisper_loss=0.06179, over 21339.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01091, ecapa_loss=0.0001713, whisper_loss=0.09094, over 3839716.49 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:52:31,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1844580.0, ans=0.125 2024-08-12 21:52:35,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1844580.0, ans=0.125 2024-08-12 21:52:38,135 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-12 21:52:40,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1844580.0, ans=0.0 2024-08-12 21:52:45,817 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-12 21:53:20,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1844880.0, ans=0.125 2024-08-12 21:53:26,051 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-12 21:53:26,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1844880.0, ans=0.125 2024-08-12 21:53:27,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1844880.0, ans=0.125 2024-08-12 21:53:34,334 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-12 21:53:48,908 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10600, loss[loss=0.118, beats_loss=0.008993, ecapa_loss=0.0002209, whisper_loss=0.1068, over 21902.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0001711, whisper_loss=0.09099, over 3842822.38 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:53:50,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1845080.0, ans=0.0 2024-08-12 21:53:54,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1845080.0, ans=0.125 2024-08-12 21:53:55,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-12 21:53:56,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1845080.0, ans=0.1 2024-08-12 21:54:00,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1845080.0, ans=0.1 2024-08-12 21:54:02,000 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 21:54:07,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1845180.0, ans=0.07 2024-08-12 21:54:16,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1845180.0, ans=0.1 2024-08-12 21:54:21,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.510e+01 2.765e+01 3.245e+01 5.665e+01, threshold=5.530e+01, percent-clipped=1.0 2024-08-12 21:54:21,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=12.0 2024-08-12 21:54:31,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1845280.0, ans=0.125 2024-08-12 21:54:57,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1845480.0, ans=0.125 2024-08-12 21:55:01,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=12.0 2024-08-12 21:55:04,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10650, loss[loss=0.09595, beats_loss=0.01066, ecapa_loss=0.0001889, whisper_loss=0.0834, over 16597.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.0001698, whisper_loss=0.09093, over 3851478.58 frames. ], batch size: 67, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:55:07,094 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-12 21:55:10,231 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.458e-02 2024-08-12 21:55:12,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1845580.0, ans=0.1 2024-08-12 21:55:34,824 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 21:55:36,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1845780.0, ans=0.1 2024-08-12 21:55:46,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-08-12 21:55:50,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=22.5 2024-08-12 21:56:05,770 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 21:56:07,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1845980.0, ans=0.125 2024-08-12 21:56:11,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-08-12 21:56:17,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1845980.0, ans=0.2 2024-08-12 21:56:23,586 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10700, loss[loss=0.1111, beats_loss=0.01111, ecapa_loss=0.0001475, whisper_loss=0.09856, over 17916.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01093, ecapa_loss=0.0001703, whisper_loss=0.09116, over 3881196.16 frames. ], batch size: 66, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:56:51,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=1846180.0, ans=0.025 2024-08-12 21:56:55,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+01 2.619e+01 2.989e+01 3.264e+01 5.454e+01, threshold=5.979e+01, percent-clipped=0.0 2024-08-12 21:57:01,498 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-12 21:57:09,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1846380.0, ans=0.125 2024-08-12 21:57:11,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1846380.0, ans=0.125 2024-08-12 21:57:27,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1846480.0, ans=0.125 2024-08-12 21:57:40,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10750, loss[loss=0.1041, beats_loss=0.009943, ecapa_loss=0.0002375, whisper_loss=0.09179, over 18980.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01094, ecapa_loss=0.000171, whisper_loss=0.09127, over 3889400.16 frames. ], batch size: 81, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:57:43,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1846580.0, ans=0.025 2024-08-12 21:57:51,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-12 21:57:59,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2024-08-12 21:58:23,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1846880.0, ans=0.125 2024-08-12 21:58:41,316 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-12 21:58:53,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10800, loss[loss=0.09276, beats_loss=0.011, ecapa_loss=0.0002068, whisper_loss=0.07969, over 16694.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01095, ecapa_loss=0.0001702, whisper_loss=0.09127, over 3896597.33 frames. ], batch size: 72, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 21:58:53,896 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 21:58:59,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1847080.0, ans=0.0 2024-08-12 21:59:01,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=22.5 2024-08-12 21:59:19,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1847180.0, ans=0.125 2024-08-12 21:59:23,682 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.464e+01 2.831e+01 3.292e+01 5.711e+01, threshold=5.661e+01, percent-clipped=0.0 2024-08-12 21:59:28,287 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.701e-03 2024-08-12 21:59:33,766 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 21:59:39,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-12 21:59:42,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1847380.0, ans=0.2 2024-08-12 21:59:45,578 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-12 21:59:45,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1847380.0, ans=0.0 2024-08-12 21:59:52,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1847480.0, ans=0.125 2024-08-12 21:59:54,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-12 21:59:59,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1847480.0, ans=0.0 2024-08-12 22:00:05,195 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10850, loss[loss=0.08523, beats_loss=0.01015, ecapa_loss=0.0002108, whisper_loss=0.07297, over 18703.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01095, ecapa_loss=0.0001716, whisper_loss=0.0917, over 3927821.25 frames. ], batch size: 82, lr: 4.80e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:00:08,174 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:00:09,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1847580.0, ans=0.125 2024-08-12 22:00:24,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.53 vs. limit=22.5 2024-08-12 22:00:29,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1847680.0, ans=0.1 2024-08-12 22:00:32,755 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 22:00:40,761 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 22:00:54,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2024-08-12 22:01:17,068 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10900, loss[loss=0.1195, beats_loss=0.009841, ecapa_loss=0.0001635, whisper_loss=0.108, over 18901.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.0001721, whisper_loss=0.0921, over 3923570.85 frames. ], batch size: 72, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:01:24,791 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 22:01:36,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=1848180.0, ans=0.05 2024-08-12 22:01:47,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-08-12 22:01:48,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.539e+01 2.752e+01 3.152e+01 5.586e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-12 22:02:04,930 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-12 22:02:15,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1848480.0, ans=0.0 2024-08-12 22:02:23,121 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-12 22:02:32,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 10950, loss[loss=0.1045, beats_loss=0.01029, ecapa_loss=0.0001854, whisper_loss=0.09239, over 22380.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01087, ecapa_loss=0.0001734, whisper_loss=0.0924, over 3936425.21 frames. ], batch size: 93, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:02:40,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1848580.0, ans=0.95 2024-08-12 22:02:43,260 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 34 from Vox, 32 fro AS 2024-08-12 22:02:53,753 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-12 22:03:07,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.45 vs. limit=15.0 2024-08-12 22:03:12,617 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-12 22:03:16,762 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 22:03:20,693 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 22:03:25,856 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-12 22:03:30,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1848880.0, ans=0.125 2024-08-12 22:03:41,758 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 22:03:44,470 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-12 22:03:47,207 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11000, loss[loss=0.1223, beats_loss=0.01152, ecapa_loss=0.0001711, whisper_loss=0.109, over 22724.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001743, whisper_loss=0.09239, over 3942716.69 frames. ], batch size: 91, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:03:52,668 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-12 22:03:57,096 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 22:03:59,914 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-12 22:04:03,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1849180.0, ans=0.125 2024-08-12 22:04:03,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1849180.0, ans=0.125 2024-08-12 22:04:10,496 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-12 22:04:11,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2024-08-12 22:04:18,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.465e+01 2.797e+01 3.199e+01 6.867e+01, threshold=5.594e+01, percent-clipped=1.0 2024-08-12 22:04:19,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1849280.0, ans=0.025 2024-08-12 22:04:27,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1849280.0, ans=0.1 2024-08-12 22:04:36,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-08-12 22:04:40,368 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-12 22:04:45,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1849480.0, ans=0.2 2024-08-12 22:04:58,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11050, loss[loss=0.115, beats_loss=0.01045, ecapa_loss=0.0001864, whisper_loss=0.1027, over 21868.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01082, ecapa_loss=0.0001746, whisper_loss=0.0928, over 3950265.65 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:05:06,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1849580.0, ans=0.07 2024-08-12 22:05:07,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1849580.0, ans=0.2 2024-08-12 22:05:23,167 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 22:05:36,208 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 22:05:46,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1849880.0, ans=0.2 2024-08-12 22:05:48,990 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-12 22:06:02,683 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-12 22:06:11,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11100, loss[loss=0.1029, beats_loss=0.01116, ecapa_loss=0.0001818, whisper_loss=0.08989, over 21308.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01084, ecapa_loss=0.0001726, whisper_loss=0.09295, over 3957472.42 frames. ], batch size: 89, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:06:27,598 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-12 22:06:44,658 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.458e+01 2.677e+01 3.068e+01 5.581e+01, threshold=5.354e+01, percent-clipped=0.0 2024-08-12 22:06:49,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1850280.0, ans=0.125 2024-08-12 22:06:52,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1850280.0, ans=0.0 2024-08-12 22:06:59,676 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-12 22:07:26,803 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11150, loss[loss=0.09178, beats_loss=0.01221, ecapa_loss=0.0001584, whisper_loss=0.07798, over 17418.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01079, ecapa_loss=0.0001721, whisper_loss=0.09306, over 3943734.17 frames. ], batch size: 69, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:07:32,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=12.0 2024-08-12 22:07:36,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2024-08-12 22:07:39,140 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:07:39,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2024-08-12 22:07:42,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1850680.0, ans=0.125 2024-08-12 22:07:57,127 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 22:07:57,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1850780.0, ans=0.125 2024-08-12 22:08:12,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=6.0 2024-08-12 22:08:13,473 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-12 22:08:15,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1850880.0, ans=0.125 2024-08-12 22:08:21,009 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 22:08:24,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.23 vs. limit=22.5 2024-08-12 22:08:29,626 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-12 22:08:38,278 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-12 22:08:41,519 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11200, loss[loss=0.09607, beats_loss=0.01109, ecapa_loss=0.0002087, whisper_loss=0.08289, over 15060.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01078, ecapa_loss=0.0001721, whisper_loss=0.09332, over 3939848.51 frames. ], batch size: 63, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:08:47,806 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 22:08:49,098 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-12 22:08:53,641 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-12 22:08:53,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1851080.0, ans=0.125 2024-08-12 22:09:14,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.199e+01 2.512e+01 2.839e+01 3.173e+01 1.150e+02, threshold=5.678e+01, percent-clipped=1.0 2024-08-12 22:09:31,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1851380.0, ans=0.04949747468305833 2024-08-12 22:09:41,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2024-08-12 22:09:52,539 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-12 22:10:00,855 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11250, loss[loss=0.112, beats_loss=0.009696, ecapa_loss=0.0001351, whisper_loss=0.101, over 23459.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01084, ecapa_loss=0.0001712, whisper_loss=0.09302, over 3929529.61 frames. ], batch size: 89, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:10:12,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1851580.0, ans=0.0 2024-08-12 22:10:20,733 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-12 22:10:35,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1851780.0, ans=0.125 2024-08-12 22:10:35,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.98 vs. limit=6.0 2024-08-12 22:10:39,315 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-12 22:10:45,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1851880.0, ans=0.2 2024-08-12 22:10:52,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1851880.0, ans=0.125 2024-08-12 22:10:57,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1851880.0, ans=0.0 2024-08-12 22:11:03,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1851980.0, ans=0.1 2024-08-12 22:11:08,285 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-12 22:11:08,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1851980.0, ans=0.125 2024-08-12 22:11:18,057 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11300, loss[loss=0.1002, beats_loss=0.01372, ecapa_loss=0.0001211, whisper_loss=0.08523, over 23217.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01087, ecapa_loss=0.0001698, whisper_loss=0.09214, over 3898637.29 frames. ], batch size: 92, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:11:36,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1852180.0, ans=10.0 2024-08-12 22:11:40,691 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:11:49,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1852180.0, ans=0.125 2024-08-12 22:11:55,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.542e+01 2.832e+01 3.166e+01 7.074e+01, threshold=5.665e+01, percent-clipped=1.0 2024-08-12 22:12:00,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1852280.0, ans=0.0 2024-08-12 22:12:13,783 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-12 22:12:20,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1852380.0, ans=0.0 2024-08-12 22:12:30,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1852480.0, ans=0.0 2024-08-12 22:12:36,055 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-12 22:12:39,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1852580.0, ans=0.2 2024-08-12 22:12:40,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11350, loss[loss=0.1054, beats_loss=0.01221, ecapa_loss=0.0001537, whisper_loss=0.09166, over 20885.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001696, whisper_loss=0.0921, over 3902952.90 frames. ], batch size: 81, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:12:49,483 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 22:12:49,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1852580.0, ans=0.0 2024-08-12 22:13:08,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2024-08-12 22:13:23,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-12 22:13:32,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1852880.0, ans=0.125 2024-08-12 22:13:37,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=12.0 2024-08-12 22:13:46,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1852980.0, ans=0.2 2024-08-12 22:14:00,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1853080.0, ans=0.2 2024-08-12 22:14:02,008 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11400, loss[loss=0.08237, beats_loss=0.01251, ecapa_loss=0.0001479, whisper_loss=0.06839, over 13606.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01092, ecapa_loss=0.0001699, whisper_loss=0.09156, over 3907604.75 frames. ], batch size: 54, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:14:33,224 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 19 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-12 22:14:36,069 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.651e+01 3.000e+01 3.420e+01 5.421e+01, threshold=6.000e+01, percent-clipped=0.0 2024-08-12 22:15:01,162 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-12 22:15:04,893 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 22:15:06,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1853480.0, ans=0.125 2024-08-12 22:15:09,234 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-12 22:15:17,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=15.0 2024-08-12 22:15:19,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11450, loss[loss=0.09623, beats_loss=0.0119, ecapa_loss=0.000198, whisper_loss=0.08234, over 22199.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.0001699, whisper_loss=0.09218, over 3911523.08 frames. ], batch size: 92, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:15:28,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1853580.0, ans=0.125 2024-08-12 22:15:29,840 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:15:38,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1853680.0, ans=0.125 2024-08-12 22:15:39,454 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 32 from Vox, 25 fro AS 2024-08-12 22:15:39,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1853680.0, ans=0.125 2024-08-12 22:15:41,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1853680.0, ans=0.125 2024-08-12 22:15:59,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1853780.0, ans=0.125 2024-08-12 22:16:09,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2024-08-12 22:16:32,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1853980.0, ans=10.0 2024-08-12 22:16:41,178 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11500, loss[loss=0.1104, beats_loss=0.009443, ecapa_loss=0.0001502, whisper_loss=0.09945, over 16353.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001701, whisper_loss=0.0918, over 3931897.48 frames. ], batch size: 62, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:16:55,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.68 vs. limit=8.0 2024-08-12 22:17:05,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1854180.0, ans=0.125 2024-08-12 22:17:13,449 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-12 22:17:17,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.445e+01 2.643e+01 2.952e+01 4.086e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-12 22:17:21,002 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 22:17:33,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1854380.0, ans=0.125 2024-08-12 22:17:56,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-12 22:18:03,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11550, loss[loss=0.1071, beats_loss=0.01267, ecapa_loss=0.0001404, whisper_loss=0.09306, over 21365.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01097, ecapa_loss=0.0001701, whisper_loss=0.09186, over 3928633.12 frames. ], batch size: 84, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:18:07,969 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 22:18:08,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1854580.0, ans=0.0 2024-08-12 22:18:09,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1854580.0, ans=0.0 2024-08-12 22:18:11,980 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 22:18:23,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1854680.0, ans=0.0 2024-08-12 22:18:39,563 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-12 22:18:39,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1854780.0, ans=0.0 2024-08-12 22:19:00,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1854880.0, ans=0.2 2024-08-12 22:19:05,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-08-12 22:19:24,471 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11600, loss[loss=0.08784, beats_loss=0.01083, ecapa_loss=0.0001784, whisper_loss=0.07522, over 19415.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.0001702, whisper_loss=0.09126, over 3935472.81 frames. ], batch size: 81, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:19:28,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2024-08-12 22:19:46,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1855180.0, ans=0.09899494936611666 2024-08-12 22:19:51,388 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-12 22:19:54,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-12 22:20:00,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.514e+01 2.737e+01 3.107e+01 4.746e+01, threshold=5.475e+01, percent-clipped=0.0 2024-08-12 22:20:04,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1855280.0, ans=0.0 2024-08-12 22:20:16,691 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-12 22:20:28,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1855480.0, ans=0.125 2024-08-12 22:20:31,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1855480.0, ans=0.125 2024-08-12 22:20:33,973 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 22:20:43,067 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11650, loss[loss=0.09874, beats_loss=0.01175, ecapa_loss=0.0002136, whisper_loss=0.08485, over 18486.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001692, whisper_loss=0.09117, over 3951903.19 frames. ], batch size: 76, lr: 4.79e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:20:53,777 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-12 22:20:59,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1855680.0, ans=0.1 2024-08-12 22:21:08,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1855680.0, ans=0.125 2024-08-12 22:21:17,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1855780.0, ans=0.025 2024-08-12 22:21:44,941 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-12 22:21:48,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1855980.0, ans=0.125 2024-08-12 22:22:03,151 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11700, loss[loss=0.1108, beats_loss=0.01141, ecapa_loss=0.0001817, whisper_loss=0.09761, over 22011.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.011, ecapa_loss=0.0001692, whisper_loss=0.09222, over 3967379.75 frames. ], batch size: 87, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:22:11,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1856080.0, ans=0.0 2024-08-12 22:22:24,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=12.0 2024-08-12 22:22:29,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1856180.0, ans=0.0 2024-08-12 22:22:33,446 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-12 22:22:36,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1856280.0, ans=0.07 2024-08-12 22:22:39,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+01 2.485e+01 2.712e+01 3.027e+01 7.497e+01, threshold=5.424e+01, percent-clipped=1.0 2024-08-12 22:22:50,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1856280.0, ans=0.1 2024-08-12 22:23:11,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2024-08-12 22:23:27,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11750, loss[loss=0.1035, beats_loss=0.01287, ecapa_loss=0.0001904, whisper_loss=0.08875, over 15217.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01103, ecapa_loss=0.0001698, whisper_loss=0.09248, over 3977486.63 frames. ], batch size: 63, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:23:28,945 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-12 22:23:31,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-08-12 22:23:36,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=22.5 2024-08-12 22:24:16,428 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:24:28,271 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 22:24:28,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1856880.0, ans=0.125 2024-08-12 22:24:33,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1856980.0, ans=0.0 2024-08-12 22:24:35,989 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-12 22:24:39,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-08-12 22:24:45,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11800, loss[loss=0.1088, beats_loss=0.01196, ecapa_loss=0.0001425, whisper_loss=0.09537, over 22391.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01109, ecapa_loss=0.0001707, whisper_loss=0.09214, over 3998984.26 frames. ], batch size: 89, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:24:46,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1857080.0, ans=10.0 2024-08-12 22:24:59,393 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-12 22:25:05,044 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 40 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 22:25:05,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1857180.0, ans=0.125 2024-08-12 22:25:06,897 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-12 22:25:08,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1857180.0, ans=0.125 2024-08-12 22:25:21,320 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.559e+01 2.833e+01 3.342e+01 5.764e+01, threshold=5.666e+01, percent-clipped=1.0 2024-08-12 22:25:21,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1857280.0, ans=0.125 2024-08-12 22:25:42,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-12 22:25:52,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1857480.0, ans=0.0 2024-08-12 22:26:06,702 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11850, loss[loss=0.1198, beats_loss=0.01138, ecapa_loss=0.0001646, whisper_loss=0.1067, over 18713.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01111, ecapa_loss=0.0001712, whisper_loss=0.0916, over 3992477.59 frames. ], batch size: 74, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:26:24,448 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-12 22:26:24,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1857680.0, ans=0.125 2024-08-12 22:26:46,784 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-12 22:26:58,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1857880.0, ans=0.1 2024-08-12 22:27:22,085 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11900, loss[loss=0.1029, beats_loss=0.01296, ecapa_loss=0.0001335, whisper_loss=0.08862, over 23008.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01108, ecapa_loss=0.000171, whisper_loss=0.09188, over 3956081.05 frames. ], batch size: 89, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:27:31,926 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 29 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-12 22:27:47,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1858180.0, ans=0.0 2024-08-12 22:27:52,923 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.540e+01 2.783e+01 3.070e+01 4.680e+01, threshold=5.566e+01, percent-clipped=0.0 2024-08-12 22:27:59,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1858280.0, ans=0.0 2024-08-12 22:28:03,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1858380.0, ans=0.0 2024-08-12 22:28:10,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2024-08-12 22:28:21,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1858480.0, ans=0.0 2024-08-12 22:28:24,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1858480.0, ans=0.0 2024-08-12 22:28:25,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1858480.0, ans=0.125 2024-08-12 22:28:31,955 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 11950, loss[loss=0.09522, beats_loss=0.01029, ecapa_loss=0.0001709, whisper_loss=0.08322, over 17129.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01095, ecapa_loss=0.0001718, whisper_loss=0.09235, over 3924959.14 frames. ], batch size: 67, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:28:33,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1858580.0, ans=0.125 2024-08-12 22:28:33,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1858580.0, ans=0.2 2024-08-12 22:28:35,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-08-12 22:28:43,333 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 22:28:46,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-12 22:28:50,134 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 22:29:08,694 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-12 22:29:15,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1858880.0, ans=0.2 2024-08-12 22:29:23,362 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 22:29:33,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1858980.0, ans=0.0 2024-08-12 22:29:39,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12000, loss[loss=0.09052, beats_loss=0.01275, ecapa_loss=0.000181, whisper_loss=0.07595, over 20459.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01097, ecapa_loss=0.0001702, whisper_loss=0.09249, over 3915573.93 frames. ], batch size: 85, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:29:39,564 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 22:30:19,826 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on ASR_libri: loss=0.2562, beats_loss=0, ecapa_loss=0.0005805, whisper_loss=0.2504, over 922467.00 frames. 2024-08-12 22:30:37,907 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on SV_voxceleb1: loss=0.004691, beats_loss=0, ecapa_loss=0.0004691, whisper_loss=0, over 939242.00 frames. 2024-08-12 22:32:33,539 INFO [train_multi_KD3.py:1149] (0/4) Epoch 13, validation on AT_audioset: loss=0.02411, beats_loss=0.02411, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 22:32:33,544 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 22:32:41,017 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 22:32:52,757 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 20 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-12 22:33:04,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+01 2.535e+01 2.857e+01 3.270e+01 5.667e+01, threshold=5.714e+01, percent-clipped=0.0 2024-08-12 22:33:08,838 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-12 22:33:10,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1859280.0, ans=0.0 2024-08-12 22:33:13,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1859280.0, ans=0.125 2024-08-12 22:33:19,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1859380.0, ans=0.2 2024-08-12 22:33:26,198 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 22:33:38,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1859480.0, ans=0.125 2024-08-12 22:33:46,414 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12050, loss[loss=0.1131, beats_loss=0.01055, ecapa_loss=0.000183, whisper_loss=0.1007, over 18061.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01105, ecapa_loss=0.0001707, whisper_loss=0.09133, over 3896260.36 frames. ], batch size: 72, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:33:57,032 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-12 22:33:57,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1859580.0, ans=0.0 2024-08-12 22:34:03,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1859680.0, ans=0.0 2024-08-12 22:34:05,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1859680.0, ans=0.0 2024-08-12 22:34:07,013 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-12 22:34:09,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-08-12 22:34:19,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1859780.0, ans=0.2 2024-08-12 22:34:34,374 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 22:34:40,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1859880.0, ans=0.2 2024-08-12 22:34:42,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-08-12 22:34:50,654 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 22:34:54,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1859980.0, ans=0.125 2024-08-12 22:34:58,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12100, loss[loss=0.0807, beats_loss=0.01323, ecapa_loss=0.0001586, whisper_loss=0.06589, over 13905.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01109, ecapa_loss=0.00017, whisper_loss=0.09078, over 3879847.64 frames. ], batch size: 57, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:35:00,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1860080.0, ans=0.1 2024-08-12 22:35:17,329 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 33 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-12 22:35:17,623 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:35:18,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1860180.0, ans=0.1 2024-08-12 22:35:23,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1860180.0, ans=0.125 2024-08-12 22:35:27,873 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.529e+01 2.799e+01 3.028e+01 6.026e+01, threshold=5.598e+01, percent-clipped=1.0 2024-08-12 22:35:36,475 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 22:35:38,188 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 22:35:39,767 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-12 22:35:46,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1860380.0, ans=0.125 2024-08-12 22:36:06,678 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-12 22:36:08,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12150, loss[loss=0.1167, beats_loss=0.009414, ecapa_loss=0.0001665, whisper_loss=0.1056, over 20861.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01103, ecapa_loss=0.0001691, whisper_loss=0.09092, over 3872160.89 frames. ], batch size: 81, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:36:10,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=12.0 2024-08-12 22:36:11,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1860580.0, ans=0.125 2024-08-12 22:36:25,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1860680.0, ans=0.125 2024-08-12 22:36:32,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=12.0 2024-08-12 22:36:38,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1860780.0, ans=0.125 2024-08-12 22:36:40,031 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-12 22:36:42,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=15.0 2024-08-12 22:37:12,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1860980.0, ans=0.0 2024-08-12 22:37:18,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12200, loss[loss=0.1097, beats_loss=0.0107, ecapa_loss=0.0001513, whisper_loss=0.09745, over 21329.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01099, ecapa_loss=0.0001685, whisper_loss=0.09194, over 3882461.04 frames. ], batch size: 82, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:37:22,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-08-12 22:37:24,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2024-08-12 22:37:39,915 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-12 22:37:47,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1861280.0, ans=0.125 2024-08-12 22:37:49,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.525e+01 2.741e+01 3.168e+01 5.471e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-12 22:37:57,139 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-12 22:38:29,562 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12250, loss[loss=0.1192, beats_loss=0.008151, ecapa_loss=0.000241, whisper_loss=0.1087, over 15087.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01094, ecapa_loss=0.0001674, whisper_loss=0.09264, over 3905688.01 frames. ], batch size: 63, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:38:51,198 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 31 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-12 22:38:51,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1861680.0, ans=0.125 2024-08-12 22:39:00,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1861780.0, ans=0.0 2024-08-12 22:39:11,164 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 22:39:12,971 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-12 22:39:21,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1861880.0, ans=0.0 2024-08-12 22:39:27,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1861980.0, ans=0.0 2024-08-12 22:39:38,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1861980.0, ans=0.0 2024-08-12 22:39:41,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12300, loss[loss=0.09776, beats_loss=0.01231, ecapa_loss=0.0001777, whisper_loss=0.08367, over 22171.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001679, whisper_loss=0.09187, over 3902735.76 frames. ], batch size: 92, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:39:44,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1862080.0, ans=0.2 2024-08-12 22:39:45,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1862080.0, ans=0.1 2024-08-12 22:39:47,652 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 22:39:58,741 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-12 22:40:07,228 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-12 22:40:11,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.506e+01 2.717e+01 3.049e+01 5.234e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-12 22:40:11,518 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:40:16,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1862280.0, ans=0.125 2024-08-12 22:40:20,595 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-12 22:40:40,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1862480.0, ans=0.2 2024-08-12 22:40:48,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=12.0 2024-08-12 22:40:48,570 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12350, loss[loss=0.09288, beats_loss=0.01091, ecapa_loss=0.0001995, whisper_loss=0.07997, over 14886.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01089, ecapa_loss=0.0001698, whisper_loss=0.09217, over 3886079.59 frames. ], batch size: 60, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:40:50,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2024-08-12 22:40:54,847 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-12 22:41:09,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1862680.0, ans=0.0 2024-08-12 22:41:17,106 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-12 22:41:19,747 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 22:41:25,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1862780.0, ans=0.2 2024-08-12 22:41:33,259 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 16 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-12 22:41:59,267 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12400, loss[loss=0.0866, beats_loss=0.01126, ecapa_loss=0.0001902, whisper_loss=0.07345, over 21546.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0109, ecapa_loss=0.0001695, whisper_loss=0.09149, over 3881743.02 frames. ], batch size: 93, lr: 4.78e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:42:28,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1863280.0, ans=0.2 2024-08-12 22:42:29,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.619e+01 2.853e+01 3.347e+01 1.216e+02, threshold=5.705e+01, percent-clipped=2.0 2024-08-12 22:42:31,343 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 33 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-12 22:42:46,607 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-12 22:43:01,203 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-12 22:43:08,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12450, loss[loss=0.1173, beats_loss=0.01033, ecapa_loss=0.0001971, whisper_loss=0.1051, over 19107.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01095, ecapa_loss=0.0001716, whisper_loss=0.09091, over 3877683.94 frames. ], batch size: 77, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:43:21,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1863680.0, ans=0.0 2024-08-12 22:43:24,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1863680.0, ans=0.0 2024-08-12 22:43:33,065 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.873e+01 2024-08-12 22:43:53,310 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-12 22:44:04,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1863980.0, ans=0.0 2024-08-12 22:44:07,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2024-08-12 22:44:16,726 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 22:44:19,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12500, loss[loss=0.08641, beats_loss=0.01125, ecapa_loss=0.0001841, whisper_loss=0.07332, over 21642.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01082, ecapa_loss=0.0001729, whisper_loss=0.0921, over 3869733.14 frames. ], batch size: 91, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:44:37,198 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 22:44:37,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1864180.0, ans=0.2 2024-08-12 22:44:48,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1864280.0, ans=0.125 2024-08-12 22:44:49,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.439e+01 2.730e+01 3.074e+01 7.978e+01, threshold=5.460e+01, percent-clipped=1.0 2024-08-12 22:45:01,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1864380.0, ans=0.07 2024-08-12 22:45:03,891 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-12 22:45:26,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12550, loss[loss=0.09305, beats_loss=0.01183, ecapa_loss=0.0001832, whisper_loss=0.07939, over 19867.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01083, ecapa_loss=0.0001716, whisper_loss=0.09259, over 3901025.64 frames. ], batch size: 84, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:45:30,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-12 22:45:39,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1864680.0, ans=0.1 2024-08-12 22:45:41,646 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-12 22:46:06,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-08-12 22:46:29,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1864980.0, ans=0.125 2024-08-12 22:46:31,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1864980.0, ans=0.125 2024-08-12 22:46:32,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1865080.0, ans=0.125 2024-08-12 22:46:33,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12600, loss[loss=0.1212, beats_loss=0.008117, ecapa_loss=0.0002025, whisper_loss=0.1111, over 21545.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01084, ecapa_loss=0.0001723, whisper_loss=0.0922, over 3890305.38 frames. ], batch size: 85, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:46:33,756 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-12 22:46:39,193 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 22:46:44,513 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-12 22:46:56,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2024-08-12 22:47:03,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.534e+01 2.817e+01 3.269e+01 5.497e+01, threshold=5.633e+01, percent-clipped=1.0 2024-08-12 22:47:04,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-12 22:47:09,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1865280.0, ans=0.125 2024-08-12 22:47:10,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1865280.0, ans=0.0 2024-08-12 22:47:12,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-08-12 22:47:14,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1865380.0, ans=0.0 2024-08-12 22:47:20,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-12 22:47:33,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1865480.0, ans=0.125 2024-08-12 22:47:39,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1865480.0, ans=0.125 2024-08-12 22:47:42,013 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12650, loss[loss=0.09311, beats_loss=0.01157, ecapa_loss=0.0001531, whisper_loss=0.08001, over 18220.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001715, whisper_loss=0.09195, over 3868577.31 frames. ], batch size: 71, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:48:05,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1865680.0, ans=0.125 2024-08-12 22:48:08,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1865780.0, ans=0.2 2024-08-12 22:48:17,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1865780.0, ans=0.125 2024-08-12 22:48:30,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1865880.0, ans=0.125 2024-08-12 22:48:45,111 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 17 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-12 22:48:50,292 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12700, loss[loss=0.1148, beats_loss=0.009243, ecapa_loss=0.0002339, whisper_loss=0.1032, over 17943.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001722, whisper_loss=0.09248, over 3883230.68 frames. ], batch size: 72, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:49:00,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1866080.0, ans=0.2 2024-08-12 22:49:09,315 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 22:49:15,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=22.5 2024-08-12 22:49:21,577 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.169e+01 2.468e+01 2.692e+01 3.051e+01 4.394e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-12 22:49:26,240 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-12 22:49:26,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1866280.0, ans=0.125 2024-08-12 22:49:30,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866280.0, ans=0.1 2024-08-12 22:49:48,943 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 22:49:59,644 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12750, loss[loss=0.09718, beats_loss=0.009185, ecapa_loss=0.0002518, whisper_loss=0.08548, over 13105.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01096, ecapa_loss=0.0001727, whisper_loss=0.09238, over 3850164.06 frames. ], batch size: 56, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:50:06,503 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 22:50:15,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-12 22:50:31,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1866780.0, ans=0.125 2024-08-12 22:50:32,304 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-12 22:51:00,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1866980.0, ans=0.1 2024-08-12 22:51:03,293 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 22:51:05,700 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12800, loss[loss=0.1045, beats_loss=0.01042, ecapa_loss=0.0001786, whisper_loss=0.09232, over 14362.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01101, ecapa_loss=0.0001739, whisper_loss=0.0922, over 3869400.11 frames. ], batch size: 58, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:51:12,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1867080.0, ans=0.125 2024-08-12 22:51:18,260 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-12 22:51:22,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-08-12 22:51:27,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=12.0 2024-08-12 22:51:31,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1867280.0, ans=0.0 2024-08-12 22:51:35,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.415e+01 2.675e+01 2.893e+01 6.675e+01, threshold=5.350e+01, percent-clipped=1.0 2024-08-12 22:51:41,176 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-12 22:51:45,477 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:51:46,512 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 22:51:48,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1867380.0, ans=0.125 2024-08-12 22:51:55,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1867380.0, ans=0.5 2024-08-12 22:52:07,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-12 22:52:08,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2024-08-12 22:52:13,124 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12850, loss[loss=0.0902, beats_loss=0.009987, ecapa_loss=0.0002191, whisper_loss=0.07802, over 19462.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.0001732, whisper_loss=0.09176, over 3878003.57 frames. ], batch size: 84, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:52:20,041 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 22:52:21,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1867580.0, ans=0.2 2024-08-12 22:52:23,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1867580.0, ans=10.0 2024-08-12 22:52:28,151 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 14 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-12 22:52:33,540 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-12 22:52:39,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-08-12 22:52:44,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1867780.0, ans=0.0 2024-08-12 22:52:55,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1867880.0, ans=0.0 2024-08-12 22:52:55,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-12 22:52:58,938 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-12 22:53:02,264 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 22:53:15,206 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 22:53:16,781 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-12 22:53:17,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1867980.0, ans=0.025 2024-08-12 22:53:17,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1867980.0, ans=0.025 2024-08-12 22:53:23,064 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12900, loss[loss=0.1071, beats_loss=0.01073, ecapa_loss=0.0001774, whisper_loss=0.09459, over 17813.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01103, ecapa_loss=0.0001722, whisper_loss=0.09099, over 3866463.00 frames. ], batch size: 70, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:53:30,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=1868080.0, ans=12.0 2024-08-12 22:53:35,594 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-12 22:53:37,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1868180.0, ans=0.125 2024-08-12 22:53:43,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1868180.0, ans=0.125 2024-08-12 22:53:46,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1868180.0, ans=0.125 2024-08-12 22:53:50,645 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 22:53:53,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.434e+01 2.743e+01 3.168e+01 4.693e+01, threshold=5.486e+01, percent-clipped=0.0 2024-08-12 22:53:55,048 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 22:54:14,947 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-12 22:54:16,402 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-12 22:54:18,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1868480.0, ans=0.125 2024-08-12 22:54:29,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1868480.0, ans=0.125 2024-08-12 22:54:32,660 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 12950, loss[loss=0.1067, beats_loss=0.008535, ecapa_loss=0.0002309, whisper_loss=0.09587, over 14240.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001725, whisper_loss=0.09093, over 3850325.93 frames. ], batch size: 57, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:54:40,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1868580.0, ans=0.1 2024-08-12 22:54:46,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1868680.0, ans=0.0 2024-08-12 22:54:50,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1868680.0, ans=0.0 2024-08-12 22:55:05,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1868780.0, ans=0.125 2024-08-12 22:55:16,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1868880.0, ans=0.125 2024-08-12 22:55:26,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1868980.0, ans=0.0 2024-08-12 22:55:28,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1868980.0, ans=0.125 2024-08-12 22:55:32,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1868980.0, ans=0.125 2024-08-12 22:55:33,436 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-12 22:55:33,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-08-12 22:55:40,324 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13000, loss[loss=0.08774, beats_loss=0.01391, ecapa_loss=0.0001541, whisper_loss=0.07229, over 15231.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01099, ecapa_loss=0.0001731, whisper_loss=0.0908, over 3873377.64 frames. ], batch size: 63, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:55:43,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1869080.0, ans=0.125 2024-08-12 22:55:48,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1869080.0, ans=0.1 2024-08-12 22:55:51,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1869080.0, ans=0.125 2024-08-12 22:56:04,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 22:56:09,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.487e+01 2.816e+01 3.426e+01 7.138e+01, threshold=5.633e+01, percent-clipped=2.0 2024-08-12 22:56:15,917 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-12 22:56:18,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1869380.0, ans=0.07 2024-08-12 22:56:46,940 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13050, loss[loss=0.1183, beats_loss=0.009831, ecapa_loss=0.0001939, whisper_loss=0.1065, over 18118.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01099, ecapa_loss=0.0001717, whisper_loss=0.09093, over 3875121.08 frames. ], batch size: 72, lr: 4.77e-03, grad_scale: 1.152921504606847e+18 2024-08-12 22:57:02,354 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-12 22:57:32,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1869880.0, ans=0.125 2024-08-12 22:57:35,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1869880.0, ans=0.2 2024-08-12 22:57:48,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2024-08-12 22:57:49,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-12 22:57:53,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13100, loss[loss=0.09149, beats_loss=0.01219, ecapa_loss=0.000212, whisper_loss=0.07718, over 21473.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01097, ecapa_loss=0.0001698, whisper_loss=0.09097, over 3876913.19 frames. ], batch size: 94, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:57:57,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1870080.0, ans=0.125 2024-08-12 22:58:02,764 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 22:58:13,431 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 22:58:23,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.487e+01 2.739e+01 3.111e+01 4.282e+01, threshold=5.479e+01, percent-clipped=0.0 2024-08-12 22:58:25,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2024-08-12 22:58:29,210 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-12 22:58:59,106 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 15 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-12 22:59:00,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13150, loss[loss=0.07836, beats_loss=0.01246, ecapa_loss=0.0001454, whisper_loss=0.06445, over 16662.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01096, ecapa_loss=0.0001699, whisper_loss=0.09094, over 3863547.63 frames. ], batch size: 63, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 22:59:03,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1870580.0, ans=0.125 2024-08-12 22:59:34,045 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-12 22:59:41,953 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-12 22:59:47,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1870880.0, ans=0.125 2024-08-12 23:00:03,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1870980.0, ans=0.125 2024-08-12 23:00:06,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13200, loss[loss=0.0934, beats_loss=0.0126, ecapa_loss=0.0001394, whisper_loss=0.07941, over 14410.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.00017, whisper_loss=0.09184, over 3867669.42 frames. ], batch size: 57, lr: 4.77e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:00:36,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.560e+01 2.764e+01 3.178e+01 9.126e+01, threshold=5.529e+01, percent-clipped=1.0 2024-08-12 23:00:39,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1871280.0, ans=0.1 2024-08-12 23:00:40,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1871280.0, ans=0.0 2024-08-12 23:00:41,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1871280.0, ans=0.125 2024-08-12 23:01:12,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13250, loss[loss=0.0889, beats_loss=0.01287, ecapa_loss=0.000125, whisper_loss=0.07478, over 20596.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001695, whisper_loss=0.09181, over 3889587.56 frames. ], batch size: 79, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:01:13,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1871580.0, ans=0.1 2024-08-12 23:01:28,145 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 17 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-12 23:01:40,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1871780.0, ans=0.07 2024-08-12 23:01:47,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1871780.0, ans=0.125 2024-08-12 23:01:47,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-12 23:01:56,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1871880.0, ans=0.125 2024-08-12 23:01:59,269 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-12 23:01:59,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1871880.0, ans=0.0 2024-08-12 23:02:03,661 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:02:04,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-08-12 23:02:20,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13300, loss[loss=0.1015, beats_loss=0.01281, ecapa_loss=0.0001541, whisper_loss=0.08713, over 20682.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01084, ecapa_loss=0.0001691, whisper_loss=0.09243, over 3864667.20 frames. ], batch size: 82, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:02:23,734 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-12 23:02:24,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1872080.0, ans=0.125 2024-08-12 23:02:31,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1872080.0, ans=0.125 2024-08-12 23:02:35,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1872180.0, ans=0.0 2024-08-12 23:02:40,873 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.55 vs. limit=10.0 2024-08-12 23:02:52,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.491e+01 2.756e+01 2.982e+01 7.499e+01, threshold=5.512e+01, percent-clipped=1.0 2024-08-12 23:03:14,134 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 26 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-12 23:03:28,643 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13350, loss[loss=0.1309, beats_loss=0.01002, ecapa_loss=0.0001642, whisper_loss=0.1193, over 23372.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01085, ecapa_loss=0.0001697, whisper_loss=0.09217, over 3856998.86 frames. ], batch size: 91, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:03:46,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1872680.0, ans=0.0 2024-08-12 23:03:49,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1872680.0, ans=0.0 2024-08-12 23:04:13,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1872880.0, ans=0.0 2024-08-12 23:04:19,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1872880.0, ans=0.125 2024-08-12 23:04:31,780 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-12 23:04:35,497 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13400, loss[loss=0.09343, beats_loss=0.01239, ecapa_loss=0.0001614, whisper_loss=0.07942, over 21401.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001691, whisper_loss=0.09217, over 3847166.40 frames. ], batch size: 87, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:04:37,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1873080.0, ans=0.0 2024-08-12 23:04:37,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1873080.0, ans=0.125 2024-08-12 23:04:39,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1873080.0, ans=0.125 2024-08-12 23:04:50,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1873180.0, ans=0.015 2024-08-12 23:04:55,835 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-12 23:04:57,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1873180.0, ans=0.1 2024-08-12 23:05:02,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1873280.0, ans=0.0 2024-08-12 23:05:06,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.402e+01 2.808e+01 3.201e+01 5.167e+01, threshold=5.616e+01, percent-clipped=0.0 2024-08-12 23:05:06,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2024-08-12 23:05:11,939 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-12 23:05:23,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1873380.0, ans=0.0 2024-08-12 23:05:28,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-12 23:05:31,317 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-12 23:05:34,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1873480.0, ans=0.125 2024-08-12 23:05:38,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1873480.0, ans=0.05 2024-08-12 23:05:39,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1873480.0, ans=0.0 2024-08-12 23:05:41,458 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13450, loss[loss=0.1004, beats_loss=0.01184, ecapa_loss=0.0001886, whisper_loss=0.08665, over 20060.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01085, ecapa_loss=0.0001695, whisper_loss=0.09194, over 3861521.02 frames. ], batch size: 81, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:05:48,930 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-12 23:06:06,360 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-12 23:06:34,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1873980.0, ans=0.125 2024-08-12 23:06:35,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1873980.0, ans=0.2 2024-08-12 23:06:48,025 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13500, loss[loss=0.1267, beats_loss=0.008282, ecapa_loss=0.0002031, whisper_loss=0.1164, over 20532.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001711, whisper_loss=0.09181, over 3855709.28 frames. ], batch size: 82, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:06:55,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1874080.0, ans=0.125 2024-08-12 23:06:56,800 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-12 23:07:19,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.452e+01 2.723e+01 3.030e+01 4.696e+01, threshold=5.446e+01, percent-clipped=0.0 2024-08-12 23:07:26,254 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-12 23:07:35,620 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-12 23:07:44,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1874480.0, ans=0.2 2024-08-12 23:07:47,251 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-12 23:07:47,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1874480.0, ans=0.125 2024-08-12 23:07:55,297 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13550, loss[loss=0.09322, beats_loss=0.01445, ecapa_loss=0.0001365, whisper_loss=0.07741, over 16443.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001717, whisper_loss=0.09202, over 3897510.79 frames. ], batch size: 64, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:07:57,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2024-08-12 23:08:15,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1874680.0, ans=0.125 2024-08-12 23:08:20,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1874780.0, ans=0.125 2024-08-12 23:08:28,657 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-12 23:08:39,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1874880.0, ans=0.125 2024-08-12 23:08:43,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1874880.0, ans=0.04949747468305833 2024-08-12 23:08:55,829 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-12 23:08:59,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-12 23:09:00,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1874980.0, ans=0.1 2024-08-12 23:09:02,101 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13600, loss[loss=0.09896, beats_loss=0.0122, ecapa_loss=0.0001899, whisper_loss=0.08486, over 19557.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001711, whisper_loss=0.0925, over 3886018.29 frames. ], batch size: 83, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:09:06,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1875080.0, ans=0.1 2024-08-12 23:09:27,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1875280.0, ans=0.0 2024-08-12 23:09:32,499 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.462e+01 2.883e+01 3.310e+01 7.463e+01, threshold=5.766e+01, percent-clipped=1.0 2024-08-12 23:09:40,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1875380.0, ans=0.0 2024-08-12 23:10:00,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1875480.0, ans=0.0 2024-08-12 23:10:01,348 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-12 23:10:07,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13650, loss[loss=0.1173, beats_loss=0.01035, ecapa_loss=0.0001743, whisper_loss=0.1052, over 22218.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001709, whisper_loss=0.092, over 3917217.36 frames. ], batch size: 88, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:10:10,280 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 23:10:14,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1875580.0, ans=0.0 2024-08-12 23:10:18,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1875580.0, ans=0.125 2024-08-12 23:10:18,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1875580.0, ans=0.0 2024-08-12 23:10:21,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1875680.0, ans=0.125 2024-08-12 23:10:23,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1875680.0, ans=0.0 2024-08-12 23:10:35,576 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-12 23:10:51,691 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:10:53,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1875880.0, ans=0.125 2024-08-12 23:10:54,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1875880.0, ans=0.0 2024-08-12 23:11:01,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2024-08-12 23:11:03,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1875980.0, ans=0.125 2024-08-12 23:11:10,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1875980.0, ans=0.0 2024-08-12 23:11:11,858 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 23:11:14,538 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13700, loss[loss=0.09254, beats_loss=0.01263, ecapa_loss=0.0001287, whisper_loss=0.07862, over 16494.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001705, whisper_loss=0.09176, over 3917214.64 frames. ], batch size: 62, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:11:16,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-12 23:11:20,331 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-12 23:11:44,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.467e+01 2.777e+01 3.137e+01 6.258e+01, threshold=5.554e+01, percent-clipped=1.0 2024-08-12 23:11:47,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-12 23:11:47,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-08-12 23:11:49,385 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-12 23:12:00,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.41 vs. limit=22.5 2024-08-12 23:12:12,431 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-12 23:12:21,814 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13750, loss[loss=0.09848, beats_loss=0.01135, ecapa_loss=0.00021, whisper_loss=0.08503, over 20578.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001707, whisper_loss=0.09238, over 3911781.60 frames. ], batch size: 87, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:12:28,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-08-12 23:12:32,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1876580.0, ans=0.125 2024-08-12 23:12:55,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1876780.0, ans=0.1 2024-08-12 23:13:08,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1876880.0, ans=0.1 2024-08-12 23:13:18,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1876980.0, ans=0.0 2024-08-12 23:13:31,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13800, loss[loss=0.1209, beats_loss=0.01025, ecapa_loss=0.0002, whisper_loss=0.1086, over 23800.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01088, ecapa_loss=0.0001708, whisper_loss=0.09304, over 3910943.52 frames. ], batch size: 94, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:13:42,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1877080.0, ans=0.02 2024-08-12 23:13:42,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1877080.0, ans=0.0 2024-08-12 23:14:00,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-12 23:14:06,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.452e+01 2.663e+01 3.049e+01 4.287e+01, threshold=5.326e+01, percent-clipped=0.0 2024-08-12 23:14:16,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1877380.0, ans=0.0 2024-08-12 23:14:47,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13850, loss[loss=0.1233, beats_loss=0.01162, ecapa_loss=0.0001869, whisper_loss=0.1098, over 21350.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01089, ecapa_loss=0.0001701, whisper_loss=0.09264, over 3902002.44 frames. ], batch size: 88, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:14:49,012 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-12 23:14:54,130 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:15:21,674 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-12 23:15:33,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1877880.0, ans=0.125 2024-08-12 23:15:43,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2024-08-12 23:15:47,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1877880.0, ans=0.125 2024-08-12 23:15:56,922 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-12 23:16:04,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13900, loss[loss=0.09952, beats_loss=0.01331, ecapa_loss=0.0001303, whisper_loss=0.08491, over 19764.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.0109, ecapa_loss=0.00017, whisper_loss=0.09246, over 3897492.93 frames. ], batch size: 75, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:16:09,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.26 vs. limit=10.0 2024-08-12 23:16:10,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1878080.0, ans=0.125 2024-08-12 23:16:31,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1878180.0, ans=0.09899494936611666 2024-08-12 23:16:35,835 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-12 23:16:39,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.486e+01 2.775e+01 2.978e+01 4.704e+01, threshold=5.551e+01, percent-clipped=0.0 2024-08-12 23:16:41,106 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 38 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 23:16:51,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1878380.0, ans=0.125 2024-08-12 23:16:56,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1878380.0, ans=0.125 2024-08-12 23:17:19,956 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 13950, loss[loss=0.1039, beats_loss=0.01211, ecapa_loss=0.0001427, whisper_loss=0.09037, over 20291.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.0109, ecapa_loss=0.0001692, whisper_loss=0.0932, over 3920064.28 frames. ], batch size: 81, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:17:34,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2024-08-12 23:17:35,753 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-12 23:17:43,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1878680.0, ans=0.0 2024-08-12 23:17:58,916 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-12 23:18:12,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1878880.0, ans=0.125 2024-08-12 23:18:35,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14000, loss[loss=0.1295, beats_loss=0.009639, ecapa_loss=0.0001805, whisper_loss=0.1181, over 21949.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01091, ecapa_loss=0.0001689, whisper_loss=0.09343, over 3894612.31 frames. ], batch size: 88, lr: 4.76e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:19:01,442 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-12 23:19:09,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.518e+01 2.898e+01 3.200e+01 5.053e+01, threshold=5.795e+01, percent-clipped=0.0 2024-08-12 23:19:14,223 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-12 23:19:21,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1879380.0, ans=0.125 2024-08-12 23:19:30,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1879380.0, ans=0.0 2024-08-12 23:19:38,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-08-12 23:19:51,641 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14050, loss[loss=0.1013, beats_loss=0.01314, ecapa_loss=0.000113, whisper_loss=0.08698, over 19743.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01094, ecapa_loss=0.0001692, whisper_loss=0.09295, over 3899249.29 frames. ], batch size: 76, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:19:53,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1879580.0, ans=0.125 2024-08-12 23:20:05,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-08-12 23:20:18,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=1879680.0, ans=0.025 2024-08-12 23:20:53,798 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-188000.pt 2024-08-12 23:20:59,507 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-12 23:21:02,307 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-12 23:21:08,984 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14100, loss[loss=0.1081, beats_loss=0.009914, ecapa_loss=0.0002225, whisper_loss=0.09593, over 21912.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01088, ecapa_loss=0.000169, whisper_loss=0.09339, over 3914448.52 frames. ], batch size: 91, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:21:28,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1880180.0, ans=0.0 2024-08-12 23:21:44,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.402e+01 2.759e+01 3.024e+01 5.678e+01, threshold=5.519e+01, percent-clipped=0.0 2024-08-12 23:22:15,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1880480.0, ans=0.1 2024-08-12 23:22:27,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14150, loss[loss=0.1036, beats_loss=0.01123, ecapa_loss=0.0001443, whisper_loss=0.09089, over 14634.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01086, ecapa_loss=0.0001701, whisper_loss=0.09388, over 3926072.24 frames. ], batch size: 57, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:22:35,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-12 23:22:36,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1880580.0, ans=0.125 2024-08-12 23:23:11,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1880780.0, ans=0.0 2024-08-12 23:23:30,892 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-12 23:23:35,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1880980.0, ans=0.125 2024-08-12 23:23:44,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=12.0 2024-08-12 23:23:45,684 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-12 23:23:46,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14200, loss[loss=0.1168, beats_loss=0.01096, ecapa_loss=0.0001592, whisper_loss=0.1042, over 23173.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01094, ecapa_loss=0.0001696, whisper_loss=0.09283, over 3933687.83 frames. ], batch size: 92, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:23:59,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-08-12 23:24:23,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1881280.0, ans=0.09899494936611666 2024-08-12 23:24:24,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.554e+01 2.881e+01 3.378e+01 7.854e+01, threshold=5.762e+01, percent-clipped=3.0 2024-08-12 23:24:35,448 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-12 23:25:02,960 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-12 23:25:04,228 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 23:25:07,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14250, loss[loss=0.08237, beats_loss=0.01214, ecapa_loss=0.0001698, whisper_loss=0.06854, over 21663.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001693, whisper_loss=0.09252, over 3943595.12 frames. ], batch size: 88, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:25:11,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-12 23:25:20,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.06 vs. limit=10.0 2024-08-12 23:25:24,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-12 23:25:28,494 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-12 23:25:28,738 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.579e-03 2024-08-12 23:25:38,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1881780.0, ans=0.0 2024-08-12 23:25:48,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1881780.0, ans=0.125 2024-08-12 23:25:53,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2024-08-12 23:26:05,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1881880.0, ans=0.0 2024-08-12 23:26:09,827 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 23:26:11,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1881980.0, ans=10.0 2024-08-12 23:26:24,028 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14300, loss[loss=0.09421, beats_loss=0.01091, ecapa_loss=0.0001717, whisper_loss=0.08159, over 16809.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01086, ecapa_loss=0.0001686, whisper_loss=0.09296, over 3933854.28 frames. ], batch size: 67, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:26:26,087 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-12 23:26:26,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1882080.0, ans=0.5 2024-08-12 23:26:32,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-08-12 23:26:35,189 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-12 23:26:58,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.532e+01 2.791e+01 3.195e+01 4.924e+01, threshold=5.583e+01, percent-clipped=0.0 2024-08-12 23:27:01,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1882280.0, ans=0.0 2024-08-12 23:27:04,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1882280.0, ans=0.0 2024-08-12 23:27:19,167 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-12 23:27:22,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1882480.0, ans=0.125 2024-08-12 23:27:22,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-12 23:27:38,287 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-12 23:27:39,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14350, loss[loss=0.08024, beats_loss=0.01251, ecapa_loss=0.0001775, whisper_loss=0.06595, over 15404.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.000169, whisper_loss=0.09211, over 3883128.25 frames. ], batch size: 65, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:27:42,986 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 40 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-12 23:27:52,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1882580.0, ans=0.125 2024-08-12 23:28:00,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1882680.0, ans=0.0 2024-08-12 23:28:08,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=12.0 2024-08-12 23:28:19,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-08-12 23:28:38,736 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-12 23:28:42,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-12 23:28:47,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1882980.0, ans=0.0 2024-08-12 23:28:50,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2024-08-12 23:28:58,889 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14400, loss[loss=0.1203, beats_loss=0.008761, ecapa_loss=0.0001926, whisper_loss=0.1096, over 18394.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001701, whisper_loss=0.09229, over 3943587.05 frames. ], batch size: 71, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:29:01,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1883080.0, ans=0.0 2024-08-12 23:29:08,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1883080.0, ans=0.09899494936611666 2024-08-12 23:29:20,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1883180.0, ans=0.025 2024-08-12 23:29:30,319 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-12 23:29:30,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1883280.0, ans=0.2 2024-08-12 23:29:33,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.540e+01 2.866e+01 3.197e+01 2.206e+02, threshold=5.732e+01, percent-clipped=2.0 2024-08-12 23:29:35,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1883280.0, ans=0.1 2024-08-12 23:29:50,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1883380.0, ans=0.1 2024-08-12 23:29:54,008 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 23:29:58,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-08-12 23:30:09,817 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 35 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-12 23:30:14,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 13, batch 14450, loss[loss=0.09705, beats_loss=0.01082, ecapa_loss=0.0001702, whisper_loss=0.08453, over 16575.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0109, ecapa_loss=0.0001703, whisper_loss=0.09196, over 3925246.59 frames. ], batch size: 64, lr: 4.75e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:30:17,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1883580.0, ans=0.09899494936611666 2024-08-12 23:30:21,239 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-12 23:30:23,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2024-08-12 23:30:24,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1883580.0, ans=0.125 2024-08-12 23:30:34,047 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-12 23:30:38,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1883680.0, ans=0.0 2024-08-12 23:30:42,304 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-12 23:30:44,373 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.725e+01 2024-08-12 23:30:50,050 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-12 23:30:54,354 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-12 23:30:57,056 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 23:30:58,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1883880.0, ans=0.125 2024-08-12 23:31:02,243 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-12 23:31:09,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1883880.0, ans=0.0 2024-08-12 23:31:16,813 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-13.pt 2024-08-12 23:31:54,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 0, loss[loss=0.08438, beats_loss=0.01191, ecapa_loss=0.0001326, whisper_loss=0.07114, over 20739.00 frames. ], tot_loss[loss=0.08438, beats_loss=0.01191, ecapa_loss=0.0001326, whisper_loss=0.07114, over 20739.00 frames. ], batch size: 80, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:31:54,682 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-12 23:32:17,611 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.9784, 1.5571, 1.8660, 1.6205, 1.9542, 1.8352, 1.9103, 1.8241], device='cuda:0') 2024-08-12 23:32:30,915 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on ASR_libri: loss=0.2554, beats_loss=0, ecapa_loss=0.0005808, whisper_loss=0.2496, over 922467.00 frames. 2024-08-12 23:32:47,301 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on SV_voxceleb1: loss=0.004647, beats_loss=0, ecapa_loss=0.0004647, whisper_loss=0, over 939242.00 frames. 2024-08-12 23:34:33,363 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on AT_audioset: loss=0.02401, beats_loss=0.02401, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-12 23:34:33,367 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-12 23:35:10,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1884090.0, ans=0.125 2024-08-12 23:35:20,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-12 23:35:53,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.601e+01 2.897e+01 3.214e+01 1.891e+02, threshold=5.795e+01, percent-clipped=1.0 2024-08-12 23:36:13,405 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-12 23:36:13,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1884390.0, ans=0.2 2024-08-12 23:36:35,587 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 50, loss[loss=0.1199, beats_loss=0.00856, ecapa_loss=0.0001596, whisper_loss=0.1097, over 24013.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01027, ecapa_loss=0.0001696, whisper_loss=0.09138, over 885027.71 frames. ], batch size: 93, lr: 4.58e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:36:38,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1884490.0, ans=0.2 2024-08-12 23:37:07,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1884590.0, ans=0.1 2024-08-12 23:37:30,928 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-12 23:37:53,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1884690.0, ans=0.125 2024-08-12 23:37:55,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1884790.0, ans=0.0 2024-08-12 23:38:07,939 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 37 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-12 23:38:21,157 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-12 23:38:21,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1884890.0, ans=0.0 2024-08-12 23:38:43,350 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 100, loss[loss=0.1004, beats_loss=0.009196, ecapa_loss=0.0001652, whisper_loss=0.08952, over 21427.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01014, ecapa_loss=0.0001701, whisper_loss=0.09215, over 1532401.55 frames. ], batch size: 82, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:39:02,623 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-12 23:39:19,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1885090.0, ans=0.0 2024-08-12 23:39:26,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1885090.0, ans=0.125 2024-08-12 23:40:01,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1885190.0, ans=0.125 2024-08-12 23:40:06,241 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-12 23:40:24,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+01 2.825e+01 3.064e+01 3.241e+01 4.540e+01, threshold=6.128e+01, percent-clipped=0.0 2024-08-12 23:41:09,885 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 23:41:11,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 150, loss[loss=0.1098, beats_loss=0.009981, ecapa_loss=0.0002095, whisper_loss=0.09772, over 15928.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01011, ecapa_loss=0.0001703, whisper_loss=0.09167, over 2029864.10 frames. ], batch size: 67, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:41:20,084 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-12 23:41:30,652 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 18 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-12 23:41:48,782 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-12 23:41:59,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1885590.0, ans=0.125 2024-08-12 23:43:03,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.88 vs. limit=10.0 2024-08-12 23:43:06,651 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-12 23:43:08,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1885890.0, ans=0.0 2024-08-12 23:43:11,343 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-12 23:43:15,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1885890.0, ans=0.125 2024-08-12 23:43:22,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1885990.0, ans=0.2 2024-08-12 23:43:23,365 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 200, loss[loss=0.1031, beats_loss=0.01128, ecapa_loss=0.0001931, whisper_loss=0.08986, over 19924.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01025, ecapa_loss=0.0001701, whisper_loss=0.09194, over 2432012.62 frames. ], batch size: 83, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:43:40,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1885990.0, ans=0.1 2024-08-12 23:43:43,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1885990.0, ans=0.125 2024-08-12 23:43:49,020 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-12 23:43:56,877 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-12 23:44:01,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-08-12 23:44:03,792 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-12 23:44:13,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1886190.0, ans=0.1 2024-08-12 23:44:28,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2024-08-12 23:44:40,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.587e+01 2.870e+01 3.355e+01 1.552e+02, threshold=5.741e+01, percent-clipped=1.0 2024-08-12 23:44:44,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1886290.0, ans=0.0 2024-08-12 23:44:46,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1886290.0, ans=0.0 2024-08-12 23:45:05,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2024-08-12 23:45:11,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1886390.0, ans=0.125 2024-08-12 23:45:16,705 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-12 23:45:26,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 250, loss[loss=0.08382, beats_loss=0.01502, ecapa_loss=0.0001683, whisper_loss=0.06712, over 17955.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.0001714, whisper_loss=0.09137, over 2720245.85 frames. ], batch size: 75, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:45:43,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1886490.0, ans=0.1 2024-08-12 23:45:47,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1886490.0, ans=0.0 2024-08-12 23:46:11,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1886590.0, ans=0.0 2024-08-12 23:46:44,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1886790.0, ans=0.1 2024-08-12 23:46:58,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1886790.0, ans=0.2 2024-08-12 23:47:11,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1886890.0, ans=0.09899494936611666 2024-08-12 23:47:18,335 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-12 23:47:27,395 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 300, loss[loss=0.1089, beats_loss=0.009545, ecapa_loss=0.0002166, whisper_loss=0.09723, over 21511.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001713, whisper_loss=0.09097, over 2953227.30 frames. ], batch size: 88, lr: 4.57e-03, grad_scale: 5.764607523034235e+17 2024-08-12 23:47:41,735 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 23:47:44,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-12 23:48:16,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.332e+01 2.692e+01 3.047e+01 7.964e+01, threshold=5.385e+01, percent-clipped=1.0 2024-08-12 23:48:34,688 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-12 23:48:44,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 350, loss[loss=0.1026, beats_loss=0.01152, ecapa_loss=0.0001791, whisper_loss=0.08931, over 16806.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001705, whisper_loss=0.09042, over 3133882.47 frames. ], batch size: 68, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:49:09,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-12 23:49:11,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1887590.0, ans=0.125 2024-08-12 23:49:13,079 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-12 23:49:26,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1887690.0, ans=0.2 2024-08-12 23:49:38,855 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-12 23:49:43,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.51 vs. limit=6.0 2024-08-12 23:49:56,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2024-08-12 23:49:59,518 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 400, loss[loss=0.1225, beats_loss=0.009069, ecapa_loss=0.0001792, whisper_loss=0.1116, over 22592.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01067, ecapa_loss=0.0001693, whisper_loss=0.09015, over 3304666.73 frames. ], batch size: 88, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:50:02,217 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 35 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-12 23:50:06,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-08-12 23:50:48,895 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-12 23:50:51,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.354e+01 2.624e+01 3.158e+01 4.755e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-12 23:51:00,747 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 29 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-12 23:51:17,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 450, loss[loss=0.106, beats_loss=0.01322, ecapa_loss=0.0001243, whisper_loss=0.09155, over 15604.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001687, whisper_loss=0.09017, over 3386181.17 frames. ], batch size: 61, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:51:17,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=12.0 2024-08-12 23:51:19,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1888490.0, ans=0.1 2024-08-12 23:51:23,339 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-12 23:51:47,550 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-12 23:51:48,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2024-08-12 23:52:10,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1888790.0, ans=0.1 2024-08-12 23:52:27,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1888890.0, ans=0.0 2024-08-12 23:52:33,510 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 500, loss[loss=0.1295, beats_loss=0.007715, ecapa_loss=0.0001824, whisper_loss=0.12, over 15674.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01067, ecapa_loss=0.0001696, whisper_loss=0.0915, over 3486276.60 frames. ], batch size: 58, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:52:43,429 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-12 23:52:43,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1888990.0, ans=0.0 2024-08-12 23:53:09,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-08-12 23:53:24,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.385e+01 2.695e+01 3.088e+01 5.680e+01, threshold=5.390e+01, percent-clipped=1.0 2024-08-12 23:53:29,431 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-12 23:53:40,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2024-08-12 23:53:51,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 550, loss[loss=0.09833, beats_loss=0.01062, ecapa_loss=0.0001221, whisper_loss=0.08649, over 20262.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01072, ecapa_loss=0.0001679, whisper_loss=0.09119, over 3593787.97 frames. ], batch size: 77, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:53:53,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1889490.0, ans=0.125 2024-08-12 23:53:56,409 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 22 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-12 23:54:02,332 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 18 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-12 23:54:08,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1889590.0, ans=0.125 2024-08-12 23:54:08,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1889590.0, ans=0.1 2024-08-12 23:54:15,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1889590.0, ans=0.0 2024-08-12 23:54:26,111 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-12 23:54:39,549 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 23 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-12 23:54:54,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-12 23:55:04,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1889890.0, ans=0.5 2024-08-12 23:55:07,075 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 600, loss[loss=0.07423, beats_loss=0.01014, ecapa_loss=0.0001537, whisper_loss=0.06256, over 20099.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001671, whisper_loss=0.09077, over 3642855.91 frames. ], batch size: 81, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:55:18,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1889990.0, ans=0.0 2024-08-12 23:55:20,567 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-12 23:55:30,778 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-12 23:55:42,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1890190.0, ans=0.125 2024-08-12 23:55:48,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1890190.0, ans=0.125 2024-08-12 23:55:54,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.472e+01 2.658e+01 3.015e+01 7.457e+01, threshold=5.315e+01, percent-clipped=1.0 2024-08-12 23:55:58,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.80 vs. limit=10.0 2024-08-12 23:56:10,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1890390.0, ans=0.1 2024-08-12 23:56:20,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 650, loss[loss=0.1128, beats_loss=0.00784, ecapa_loss=0.0002057, whisper_loss=0.1029, over 18056.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01066, ecapa_loss=0.0001691, whisper_loss=0.0916, over 3700858.68 frames. ], batch size: 68, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:56:40,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1890590.0, ans=10.0 2024-08-12 23:56:47,043 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-12 23:56:50,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1890690.0, ans=0.125 2024-08-12 23:57:06,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1890790.0, ans=0.125 2024-08-12 23:57:06,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1890790.0, ans=0.0 2024-08-12 23:57:25,777 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-12 23:57:35,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1890990.0, ans=0.04949747468305833 2024-08-12 23:57:36,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 700, loss[loss=0.08447, beats_loss=0.01287, ecapa_loss=0.0001614, whisper_loss=0.06998, over 20179.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001685, whisper_loss=0.09135, over 3717736.79 frames. ], batch size: 82, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:57:43,839 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-12 23:58:11,612 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-12 23:58:20,332 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-12 23:58:24,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.432e+01 2.727e+01 3.024e+01 4.665e+01, threshold=5.453e+01, percent-clipped=0.0 2024-08-12 23:58:25,879 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-12 23:58:31,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1891290.0, ans=0.0 2024-08-12 23:58:38,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1891390.0, ans=0.125 2024-08-12 23:58:49,406 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 750, loss[loss=0.1206, beats_loss=0.009478, ecapa_loss=0.0001704, whisper_loss=0.1094, over 23235.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001668, whisper_loss=0.09168, over 3750431.88 frames. ], batch size: 90, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-12 23:58:53,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-12 23:58:56,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1891490.0, ans=0.0 2024-08-12 23:59:00,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1891490.0, ans=0.2 2024-08-12 23:59:06,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-08-12 23:59:13,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1891590.0, ans=0.07 2024-08-12 23:59:22,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1891690.0, ans=0.125 2024-08-12 23:59:24,827 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-12 23:59:25,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1891690.0, ans=0.125 2024-08-12 23:59:35,907 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-12 23:59:47,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-12 23:59:50,087 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-12 23:59:58,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1891890.0, ans=0.125 2024-08-13 00:00:04,018 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 800, loss[loss=0.08997, beats_loss=0.009105, ecapa_loss=0.0001661, whisper_loss=0.0792, over 20389.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01077, ecapa_loss=0.0001668, whisper_loss=0.09127, over 3777381.23 frames. ], batch size: 81, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:00:15,624 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 00:00:27,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-13 00:00:40,162 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 27 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-13 00:00:54,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.376e+01 2.556e+01 2.956e+01 7.880e+01, threshold=5.112e+01, percent-clipped=1.0 2024-08-13 00:01:19,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 850, loss[loss=0.09372, beats_loss=0.01156, ecapa_loss=0.0001812, whisper_loss=0.08034, over 15280.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01074, ecapa_loss=0.0001666, whisper_loss=0.09088, over 3771849.55 frames. ], batch size: 59, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:01:36,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1892590.0, ans=0.125 2024-08-13 00:01:40,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2024-08-13 00:01:52,511 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-13 00:01:54,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-13 00:02:29,239 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 00:02:31,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 900, loss[loss=0.08422, beats_loss=0.01297, ecapa_loss=0.0001769, whisper_loss=0.06948, over 15733.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.0001669, whisper_loss=0.09082, over 3776008.10 frames. ], batch size: 65, lr: 4.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:02:36,679 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 00:02:38,166 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 00:02:47,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1893090.0, ans=0.125 2024-08-13 00:02:59,598 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 00:03:04,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1893190.0, ans=0.0 2024-08-13 00:03:12,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1893190.0, ans=0.0 2024-08-13 00:03:16,905 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 16 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 00:03:19,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.408e+01 2.662e+01 2.977e+01 4.425e+01, threshold=5.325e+01, percent-clipped=0.0 2024-08-13 00:03:25,288 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 00:03:44,351 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 950, loss[loss=0.09674, beats_loss=0.01154, ecapa_loss=0.0001678, whisper_loss=0.08352, over 19819.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01083, ecapa_loss=0.0001654, whisper_loss=0.09057, over 3818411.31 frames. ], batch size: 81, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:03:44,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1893490.0, ans=0.125 2024-08-13 00:03:54,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1893490.0, ans=0.125 2024-08-13 00:03:56,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1893490.0, ans=0.125 2024-08-13 00:03:59,908 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 00:04:32,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1893790.0, ans=0.125 2024-08-13 00:04:38,161 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 00:04:41,183 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 00:04:43,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1893790.0, ans=0.125 2024-08-13 00:04:43,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1893790.0, ans=0.1 2024-08-13 00:04:44,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-08-13 00:04:49,091 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 00:04:58,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1893990.0, ans=0.125 2024-08-13 00:05:00,117 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1000, loss[loss=0.1111, beats_loss=0.009857, ecapa_loss=0.0002218, whisper_loss=0.09906, over 20884.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01084, ecapa_loss=0.0001646, whisper_loss=0.09025, over 3830656.81 frames. ], batch size: 89, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:05:37,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1894190.0, ans=0.0 2024-08-13 00:05:48,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.405e+01 2.688e+01 3.061e+01 4.317e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 00:05:48,378 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 00:05:48,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1894290.0, ans=0.125 2024-08-13 00:06:13,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1050, loss[loss=0.1152, beats_loss=0.008774, ecapa_loss=0.0001758, whisper_loss=0.1046, over 16845.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001638, whisper_loss=0.09049, over 3836972.80 frames. ], batch size: 65, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:06:27,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1894490.0, ans=0.125 2024-08-13 00:06:36,293 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-13 00:07:08,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1894790.0, ans=0.0 2024-08-13 00:07:12,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1894790.0, ans=0.125 2024-08-13 00:07:13,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1894790.0, ans=0.0 2024-08-13 00:07:30,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1894890.0, ans=0.125 2024-08-13 00:07:34,280 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1100, loss[loss=0.09502, beats_loss=0.00959, ecapa_loss=0.0001902, whisper_loss=0.08353, over 13970.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001643, whisper_loss=0.09121, over 3854972.55 frames. ], batch size: 55, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:07:36,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1894990.0, ans=0.1 2024-08-13 00:07:44,456 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 00:08:03,446 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 00:08:07,830 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.394e-01 2024-08-13 00:08:16,408 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:08:25,043 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.502e+01 2.869e+01 3.346e+01 6.186e+01, threshold=5.739e+01, percent-clipped=2.0 2024-08-13 00:08:51,193 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1150, loss[loss=0.1018, beats_loss=0.008068, ecapa_loss=0.0002394, whisper_loss=0.09134, over 21757.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001654, whisper_loss=0.09073, over 3857471.23 frames. ], batch size: 92, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:08:52,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-08-13 00:09:04,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1895490.0, ans=0.125 2024-08-13 00:09:07,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=1895590.0, ans=0.1 2024-08-13 00:09:26,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1895690.0, ans=0.125 2024-08-13 00:09:31,565 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 00:09:40,843 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 00:09:41,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-08-13 00:09:49,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1895790.0, ans=0.1 2024-08-13 00:09:55,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1895890.0, ans=0.125 2024-08-13 00:10:10,572 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1200, loss[loss=0.0929, beats_loss=0.01103, ecapa_loss=0.0001408, whisper_loss=0.08047, over 18506.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01079, ecapa_loss=0.0001655, whisper_loss=0.09022, over 3806390.76 frames. ], batch size: 71, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:10:13,612 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 00:10:24,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1896090.0, ans=0.125 2024-08-13 00:10:34,349 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.911e+00 2024-08-13 00:11:00,600 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 00:11:06,012 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.344e+01 2.617e+01 3.051e+01 6.950e+01, threshold=5.235e+01, percent-clipped=1.0 2024-08-13 00:11:13,604 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 14 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 00:11:15,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1896390.0, ans=0.0 2024-08-13 00:11:29,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-13 00:11:31,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1896490.0, ans=0.2 2024-08-13 00:11:31,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1250, loss[loss=0.09506, beats_loss=0.01139, ecapa_loss=0.0001494, whisper_loss=0.08217, over 20316.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01087, ecapa_loss=0.0001655, whisper_loss=0.08992, over 3825111.88 frames. ], batch size: 78, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:11:44,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1896490.0, ans=0.0 2024-08-13 00:11:44,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1896490.0, ans=0.0 2024-08-13 00:11:49,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1896590.0, ans=0.0 2024-08-13 00:12:02,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-13 00:12:22,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1896790.0, ans=0.1 2024-08-13 00:12:31,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1896790.0, ans=0.0 2024-08-13 00:12:34,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1896890.0, ans=0.125 2024-08-13 00:12:41,920 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 00:12:47,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1896890.0, ans=0.05 2024-08-13 00:12:50,302 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1300, loss[loss=0.08103, beats_loss=0.01109, ecapa_loss=0.0001036, whisper_loss=0.0689, over 14976.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01085, ecapa_loss=0.000164, whisper_loss=0.09006, over 3823368.13 frames. ], batch size: 53, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:13:15,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1897090.0, ans=0.1 2024-08-13 00:13:17,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1897090.0, ans=0.1 2024-08-13 00:13:23,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1897190.0, ans=0.2 2024-08-13 00:13:32,470 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 00:13:43,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.447e+01 2.732e+01 3.060e+01 1.003e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-13 00:13:48,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1897290.0, ans=0.125 2024-08-13 00:14:12,435 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1350, loss[loss=0.1002, beats_loss=0.01059, ecapa_loss=0.000157, whisper_loss=0.08803, over 23152.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01086, ecapa_loss=0.0001631, whisper_loss=0.09097, over 3847876.54 frames. ], batch size: 89, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:14:16,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1897490.0, ans=0.0 2024-08-13 00:14:17,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1897490.0, ans=0.125 2024-08-13 00:14:44,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1897690.0, ans=0.0 2024-08-13 00:14:57,299 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 00:15:11,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1897790.0, ans=0.2 2024-08-13 00:15:11,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1897790.0, ans=0.125 2024-08-13 00:15:13,105 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 00:15:15,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-13 00:15:28,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1897890.0, ans=0.0 2024-08-13 00:15:33,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1400, loss[loss=0.1099, beats_loss=0.01012, ecapa_loss=0.0001611, whisper_loss=0.09819, over 16562.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01077, ecapa_loss=0.0001628, whisper_loss=0.09138, over 3829871.27 frames. ], batch size: 66, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:15:36,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1897990.0, ans=0.125 2024-08-13 00:16:01,999 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 00:16:17,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-13 00:16:19,680 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 25 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-13 00:16:25,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.414e+01 2.708e+01 3.137e+01 5.162e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-13 00:16:29,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1898290.0, ans=0.0 2024-08-13 00:16:32,820 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 00:16:37,538 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 00:16:54,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1450, loss[loss=0.09746, beats_loss=0.01035, ecapa_loss=0.0001491, whisper_loss=0.08562, over 20287.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001619, whisper_loss=0.09119, over 3849366.77 frames. ], batch size: 78, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:16:55,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1898490.0, ans=15.0 2024-08-13 00:17:37,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-13 00:18:00,628 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 00:18:01,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-08-13 00:18:10,951 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 00:18:29,617 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 35 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 00:18:41,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2024-08-13 00:18:42,212 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 17 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-13 00:18:43,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1500, loss[loss=0.08089, beats_loss=0.01053, ecapa_loss=0.0002591, whisper_loss=0.06777, over 15937.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01089, ecapa_loss=0.0001626, whisper_loss=0.09052, over 3821617.16 frames. ], batch size: 71, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:18:52,851 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 00:18:53,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1898990.0, ans=0.0 2024-08-13 00:19:19,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=1899190.0, ans=12.0 2024-08-13 00:19:28,774 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 00:19:34,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1899290.0, ans=0.0 2024-08-13 00:19:35,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.418e+01 2.688e+01 3.116e+01 4.487e+01, threshold=5.376e+01, percent-clipped=0.0 2024-08-13 00:19:40,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1899290.0, ans=0.0 2024-08-13 00:19:51,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=12.0 2024-08-13 00:19:59,467 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 00:20:02,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1550, loss[loss=0.09751, beats_loss=0.008356, ecapa_loss=0.0001534, whisper_loss=0.08762, over 17327.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01096, ecapa_loss=0.000163, whisper_loss=0.08955, over 3793221.72 frames. ], batch size: 65, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:20:05,780 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 00:20:11,974 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 00:20:27,798 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 27 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 00:20:36,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-13 00:20:40,037 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:20:40,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-08-13 00:20:50,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1899790.0, ans=0.125 2024-08-13 00:20:55,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1899790.0, ans=0.0 2024-08-13 00:20:55,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1899790.0, ans=0.0 2024-08-13 00:21:10,604 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 00:21:12,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1899890.0, ans=0.1 2024-08-13 00:21:20,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1600, loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001672, whisper_loss=0.09054, over 16686.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01095, ecapa_loss=0.000162, whisper_loss=0.08983, over 3809753.68 frames. ], batch size: 67, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:21:22,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1899990.0, ans=0.1 2024-08-13 00:21:51,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-08-13 00:21:52,246 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-13 00:21:56,868 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 37 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 00:22:01,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1900190.0, ans=0.1 2024-08-13 00:22:12,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.583e+01 2.856e+01 3.340e+01 1.108e+02, threshold=5.712e+01, percent-clipped=2.0 2024-08-13 00:22:16,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1900290.0, ans=0.1 2024-08-13 00:22:26,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1900390.0, ans=0.2 2024-08-13 00:22:30,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1900390.0, ans=0.025 2024-08-13 00:22:38,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1650, loss[loss=0.09417, beats_loss=0.01244, ecapa_loss=0.0001966, whisper_loss=0.07977, over 21359.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01096, ecapa_loss=0.0001613, whisper_loss=0.0901, over 3811511.99 frames. ], batch size: 89, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:22:51,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1900590.0, ans=0.2 2024-08-13 00:22:53,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1900590.0, ans=0.0 2024-08-13 00:22:59,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1900590.0, ans=0.125 2024-08-13 00:23:19,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1900690.0, ans=0.1 2024-08-13 00:23:37,738 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-13 00:23:51,678 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 00:23:53,339 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1700, loss[loss=0.09112, beats_loss=0.01232, ecapa_loss=0.0001854, whisper_loss=0.07695, over 20878.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01088, ecapa_loss=0.0001633, whisper_loss=0.09061, over 3798104.99 frames. ], batch size: 90, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:23:55,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1900990.0, ans=0.2 2024-08-13 00:24:02,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1900990.0, ans=0.1 2024-08-13 00:24:04,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1900990.0, ans=0.1 2024-08-13 00:24:14,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1901090.0, ans=0.2 2024-08-13 00:24:21,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-08-13 00:24:24,875 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-13 00:24:29,540 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 00:24:42,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.360e+01 2.688e+01 2.973e+01 4.042e+01, threshold=5.375e+01, percent-clipped=0.0 2024-08-13 00:24:59,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1901390.0, ans=0.1 2024-08-13 00:25:05,083 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 00:25:07,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1750, loss[loss=0.1125, beats_loss=0.008516, ecapa_loss=0.0001512, whisper_loss=0.1025, over 22869.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01091, ecapa_loss=0.0001622, whisper_loss=0.09038, over 3822874.51 frames. ], batch size: 89, lr: 4.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:25:18,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1901490.0, ans=0.125 2024-08-13 00:25:19,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1901490.0, ans=0.0 2024-08-13 00:25:39,333 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 00:25:41,042 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 00:25:55,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-13 00:26:20,716 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1800, loss[loss=0.09591, beats_loss=0.01027, ecapa_loss=0.0001614, whisper_loss=0.08403, over 18662.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01083, ecapa_loss=0.0001638, whisper_loss=0.0908, over 3832509.22 frames. ], batch size: 76, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:26:24,784 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 00:26:25,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1901990.0, ans=0.125 2024-08-13 00:26:32,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1901990.0, ans=0.125 2024-08-13 00:26:32,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1901990.0, ans=0.125 2024-08-13 00:26:32,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1901990.0, ans=0.125 2024-08-13 00:26:37,198 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 00:26:37,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.37 vs. limit=15.0 2024-08-13 00:26:39,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1902090.0, ans=0.125 2024-08-13 00:26:42,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-13 00:26:56,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1902190.0, ans=0.1 2024-08-13 00:27:02,742 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 00:27:09,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1902290.0, ans=0.0 2024-08-13 00:27:12,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.455e+01 2.703e+01 3.083e+01 4.143e+01, threshold=5.406e+01, percent-clipped=0.0 2024-08-13 00:27:30,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2024-08-13 00:27:40,221 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1850, loss[loss=0.08386, beats_loss=0.01148, ecapa_loss=0.0002032, whisper_loss=0.07035, over 18101.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01079, ecapa_loss=0.0001637, whisper_loss=0.0908, over 3834158.98 frames. ], batch size: 75, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:28:03,568 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 00:28:55,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1902890.0, ans=0.125 2024-08-13 00:28:57,593 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 00:29:00,950 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1900, loss[loss=0.09548, beats_loss=0.01179, ecapa_loss=0.0001434, whisper_loss=0.08226, over 14369.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001646, whisper_loss=0.09042, over 3781957.22 frames. ], batch size: 55, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:29:05,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1902990.0, ans=0.95 2024-08-13 00:29:05,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1902990.0, ans=0.1 2024-08-13 00:29:13,216 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 00:29:22,932 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 00:29:38,566 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 00:29:53,712 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.471e+01 2.746e+01 3.040e+01 5.075e+01, threshold=5.492e+01, percent-clipped=0.0 2024-08-13 00:29:57,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1903290.0, ans=0.125 2024-08-13 00:30:20,530 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 1950, loss[loss=0.1105, beats_loss=0.01154, ecapa_loss=0.0001636, whisper_loss=0.0973, over 23389.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001652, whisper_loss=0.09084, over 3769815.62 frames. ], batch size: 91, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:30:25,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1903490.0, ans=0.125 2024-08-13 00:30:40,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1903590.0, ans=0.0 2024-08-13 00:30:40,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1903590.0, ans=0.09899494936611666 2024-08-13 00:30:55,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1903690.0, ans=0.125 2024-08-13 00:31:01,515 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 00:31:12,471 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 00:31:14,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1903790.0, ans=0.125 2024-08-13 00:31:15,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-08-13 00:31:27,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1903890.0, ans=0.0 2024-08-13 00:31:32,749 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 26 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-13 00:31:37,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1903990.0, ans=0.125 2024-08-13 00:31:39,293 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2000, loss[loss=0.124, beats_loss=0.009144, ecapa_loss=0.0001747, whisper_loss=0.1131, over 22925.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0107, ecapa_loss=0.0001648, whisper_loss=0.09084, over 3756268.15 frames. ], batch size: 90, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:31:46,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2024-08-13 00:31:56,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1904090.0, ans=0.125 2024-08-13 00:32:07,217 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 00:32:30,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.391e+01 2.734e+01 3.144e+01 4.841e+01, threshold=5.468e+01, percent-clipped=0.0 2024-08-13 00:32:36,220 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 00:32:37,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1904290.0, ans=0.125 2024-08-13 00:32:37,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1904290.0, ans=0.125 2024-08-13 00:32:45,661 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 00:32:45,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1904390.0, ans=0.125 2024-08-13 00:32:46,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=22.5 2024-08-13 00:32:56,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2050, loss[loss=0.09823, beats_loss=0.012, ecapa_loss=0.0001828, whisper_loss=0.0844, over 18447.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001644, whisper_loss=0.09026, over 3762127.15 frames. ], batch size: 75, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:33:08,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-13 00:33:20,229 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 00:33:20,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1904590.0, ans=0.0 2024-08-13 00:33:40,777 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 00:33:44,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2024-08-13 00:33:53,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2024-08-13 00:34:12,437 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2100, loss[loss=0.1188, beats_loss=0.011, ecapa_loss=0.0001472, whisper_loss=0.1063, over 16855.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01075, ecapa_loss=0.000164, whisper_loss=0.09076, over 3772434.15 frames. ], batch size: 64, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:34:14,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1904990.0, ans=0.0 2024-08-13 00:34:34,493 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 00:34:43,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1905190.0, ans=0.125 2024-08-13 00:34:51,142 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 00:34:51,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1905190.0, ans=0.0 2024-08-13 00:35:03,407 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.317e+01 2.588e+01 2.864e+01 4.791e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-13 00:35:08,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1905290.0, ans=0.2 2024-08-13 00:35:10,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1905290.0, ans=0.2 2024-08-13 00:35:16,102 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 00:35:27,876 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 29 from Vox, 21 fro AS 2024-08-13 00:35:29,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2150, loss[loss=0.112, beats_loss=0.007978, ecapa_loss=0.0002286, whisper_loss=0.1018, over 18140.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01082, ecapa_loss=0.0001636, whisper_loss=0.09101, over 3770579.65 frames. ], batch size: 76, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:35:38,364 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:35:55,441 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 23 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-13 00:36:05,624 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 00:36:18,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1905790.0, ans=0.125 2024-08-13 00:36:20,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2024-08-13 00:36:43,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1905890.0, ans=0.125 2024-08-13 00:36:43,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-13 00:36:51,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2200, loss[loss=0.118, beats_loss=0.0103, ecapa_loss=0.0001876, whisper_loss=0.1058, over 22827.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.0001627, whisper_loss=0.09133, over 3792736.80 frames. ], batch size: 92, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:36:51,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1905990.0, ans=0.0 2024-08-13 00:37:09,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1906090.0, ans=0.125 2024-08-13 00:37:16,312 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-13 00:37:17,549 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 00:37:21,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1906090.0, ans=0.1 2024-08-13 00:37:29,635 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 13 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 00:37:45,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.358e+01 2.742e+01 3.274e+01 9.057e+01, threshold=5.483e+01, percent-clipped=3.0 2024-08-13 00:37:47,653 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-13 00:37:48,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-13 00:38:00,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1906390.0, ans=0.125 2024-08-13 00:38:02,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1906390.0, ans=0.125 2024-08-13 00:38:09,051 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 19 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 00:38:11,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1906390.0, ans=0.125 2024-08-13 00:38:13,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2250, loss[loss=0.1091, beats_loss=0.01188, ecapa_loss=0.0001529, whisper_loss=0.09569, over 23463.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01093, ecapa_loss=0.0001664, whisper_loss=0.09124, over 3787299.59 frames. ], batch size: 90, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:38:18,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1906490.0, ans=0.125 2024-08-13 00:38:39,401 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=12.0 2024-08-13 00:38:40,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1906590.0, ans=0.0 2024-08-13 00:39:20,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1906890.0, ans=0.0 2024-08-13 00:39:27,015 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 38 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 00:39:30,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1906890.0, ans=0.125 2024-08-13 00:39:37,954 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2300, loss[loss=0.1126, beats_loss=0.008969, ecapa_loss=0.0001858, whisper_loss=0.1018, over 17877.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01092, ecapa_loss=0.0001659, whisper_loss=0.09222, over 3843768.22 frames. ], batch size: 70, lr: 4.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 00:39:46,049 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 00:40:10,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1907190.0, ans=0.125 2024-08-13 00:40:18,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1907190.0, ans=0.125 2024-08-13 00:40:20,145 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 00:40:28,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1907290.0, ans=0.125 2024-08-13 00:40:32,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.474e+01 2.795e+01 3.232e+01 6.818e+01, threshold=5.590e+01, percent-clipped=1.0 2024-08-13 00:40:36,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1907290.0, ans=0.125 2024-08-13 00:40:41,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1907290.0, ans=0.0 2024-08-13 00:40:45,589 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 00:40:47,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1907390.0, ans=0.125 2024-08-13 00:40:55,121 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.183e+00 2024-08-13 00:41:00,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2350, loss[loss=0.1043, beats_loss=0.01028, ecapa_loss=0.0001468, whisper_loss=0.0926, over 19700.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01087, ecapa_loss=0.0001671, whisper_loss=0.09232, over 3854120.10 frames. ], batch size: 75, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:41:00,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1907490.0, ans=0.2 2024-08-13 00:41:06,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1907490.0, ans=0.125 2024-08-13 00:41:23,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1907590.0, ans=0.1 2024-08-13 00:41:38,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2024-08-13 00:41:42,195 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 28 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 00:42:03,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1907790.0, ans=0.125 2024-08-13 00:42:23,039 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2400, loss[loss=0.09024, beats_loss=0.01468, ecapa_loss=0.0002006, whisper_loss=0.07355, over 18269.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01082, ecapa_loss=0.0001679, whisper_loss=0.09285, over 3893143.24 frames. ], batch size: 77, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:42:44,844 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-13 00:42:47,199 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 00:42:47,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1908090.0, ans=0.125 2024-08-13 00:42:50,951 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 00:43:04,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1908190.0, ans=0.0 2024-08-13 00:43:16,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.472e+01 2.673e+01 3.015e+01 1.435e+02, threshold=5.346e+01, percent-clipped=1.0 2024-08-13 00:43:35,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1908390.0, ans=0.125 2024-08-13 00:43:35,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1908390.0, ans=0.1 2024-08-13 00:43:37,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1908390.0, ans=0.125 2024-08-13 00:43:45,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2450, loss[loss=0.1167, beats_loss=0.008994, ecapa_loss=0.0001783, whisper_loss=0.1059, over 17104.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01077, ecapa_loss=0.0001675, whisper_loss=0.09304, over 3890860.19 frames. ], batch size: 67, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:44:00,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1908590.0, ans=0.125 2024-08-13 00:44:26,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1908690.0, ans=0.0 2024-08-13 00:44:32,228 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 00:44:45,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-13 00:44:46,748 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 11 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 00:45:02,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-13 00:45:06,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2500, loss[loss=0.1185, beats_loss=0.006951, ecapa_loss=0.0002117, whisper_loss=0.1094, over 16803.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01074, ecapa_loss=0.0001679, whisper_loss=0.09278, over 3878561.25 frames. ], batch size: 66, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:45:11,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1908990.0, ans=0.2 2024-08-13 00:45:17,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1908990.0, ans=0.125 2024-08-13 00:45:53,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1909190.0, ans=0.0 2024-08-13 00:46:01,282 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+01 2.554e+01 2.851e+01 3.287e+01 4.773e+01, threshold=5.702e+01, percent-clipped=0.0 2024-08-13 00:46:15,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1909390.0, ans=0.125 2024-08-13 00:46:17,930 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 00:46:31,423 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2550, loss[loss=0.1259, beats_loss=0.009047, ecapa_loss=0.0001476, whisper_loss=0.1153, over 22564.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01069, ecapa_loss=0.0001678, whisper_loss=0.09289, over 3872187.63 frames. ], batch size: 86, lr: 4.55e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:46:31,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1909490.0, ans=0.04949747468305833 2024-08-13 00:46:38,101 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 00:46:44,307 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 29 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 00:46:46,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-13 00:46:49,037 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 23 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-13 00:46:54,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1909590.0, ans=0.125 2024-08-13 00:47:25,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2024-08-13 00:47:26,743 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 00:47:31,173 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 00:47:34,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1909890.0, ans=0.0 2024-08-13 00:47:38,672 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 00:47:42,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1909890.0, ans=0.125 2024-08-13 00:47:50,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1909890.0, ans=0.125 2024-08-13 00:47:53,964 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2600, loss[loss=0.09944, beats_loss=0.01266, ecapa_loss=0.0001181, whisper_loss=0.0856, over 17546.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01072, ecapa_loss=0.0001671, whisper_loss=0.09291, over 3887799.26 frames. ], batch size: 68, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:48:03,322 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 00:48:09,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1909990.0, ans=0.0 2024-08-13 00:48:25,374 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-13 00:48:48,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1910290.0, ans=0.1 2024-08-13 00:48:49,431 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 21 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-13 00:48:52,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.514e+01 2.741e+01 3.048e+01 4.490e+01, threshold=5.482e+01, percent-clipped=0.0 2024-08-13 00:48:52,488 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-13 00:49:03,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1910390.0, ans=0.0 2024-08-13 00:49:05,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1910390.0, ans=0.125 2024-08-13 00:49:07,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1910390.0, ans=0.02 2024-08-13 00:49:21,802 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2650, loss[loss=0.09745, beats_loss=0.01172, ecapa_loss=0.0001627, whisper_loss=0.08411, over 15847.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001675, whisper_loss=0.09231, over 3893779.75 frames. ], batch size: 61, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:49:22,032 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 00:49:24,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1910490.0, ans=0.125 2024-08-13 00:49:33,562 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:49:35,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-13 00:49:43,342 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 00:49:47,905 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 28 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 00:50:00,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1910690.0, ans=0.125 2024-08-13 00:50:01,392 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 25 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 00:50:14,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2024-08-13 00:50:41,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1910890.0, ans=0.125 2024-08-13 00:50:41,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1910890.0, ans=0.125 2024-08-13 00:50:43,724 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2700, loss[loss=0.08226, beats_loss=0.01256, ecapa_loss=0.0001592, whisper_loss=0.0681, over 18230.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01078, ecapa_loss=0.000167, whisper_loss=0.09181, over 3886499.06 frames. ], batch size: 73, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:51:10,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-13 00:51:13,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1911090.0, ans=0.0 2024-08-13 00:51:38,035 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.492e+01 2.764e+01 3.227e+01 2.218e+02, threshold=5.527e+01, percent-clipped=1.0 2024-08-13 00:51:43,639 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 00:51:49,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1911390.0, ans=0.125 2024-08-13 00:51:51,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1911390.0, ans=0.125 2024-08-13 00:52:03,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1911390.0, ans=0.2 2024-08-13 00:52:06,519 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2750, loss[loss=0.0943, beats_loss=0.0118, ecapa_loss=0.0001255, whisper_loss=0.08124, over 14139.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001667, whisper_loss=0.09203, over 3899891.09 frames. ], batch size: 53, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:52:07,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-13 00:52:15,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1911490.0, ans=0.125 2024-08-13 00:52:16,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-13 00:52:17,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1911490.0, ans=0.2 2024-08-13 00:52:17,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1911490.0, ans=0.125 2024-08-13 00:52:24,635 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 00:52:29,346 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 00:52:29,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2024-08-13 00:52:40,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1911690.0, ans=0.2 2024-08-13 00:52:40,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1911690.0, ans=0.125 2024-08-13 00:52:42,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-13 00:52:59,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1911790.0, ans=0.2 2024-08-13 00:53:24,103 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 00:53:31,191 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2800, loss[loss=0.1169, beats_loss=0.01126, ecapa_loss=0.0001829, whisper_loss=0.1038, over 21423.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01076, ecapa_loss=0.0001664, whisper_loss=0.09257, over 3888555.79 frames. ], batch size: 87, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:53:37,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1911990.0, ans=0.2 2024-08-13 00:53:37,857 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.066e+00 2024-08-13 00:53:43,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1911990.0, ans=0.2 2024-08-13 00:53:49,335 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 00:54:05,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1912190.0, ans=0.0 2024-08-13 00:54:28,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.473e+01 2.733e+01 3.017e+01 4.460e+01, threshold=5.467e+01, percent-clipped=0.0 2024-08-13 00:54:34,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=12.0 2024-08-13 00:54:57,323 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2850, loss[loss=0.1087, beats_loss=0.01219, ecapa_loss=0.0001609, whisper_loss=0.09488, over 22415.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01078, ecapa_loss=0.000167, whisper_loss=0.09259, over 3894317.13 frames. ], batch size: 93, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:55:11,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1912490.0, ans=0.1 2024-08-13 00:55:23,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.45 vs. limit=12.0 2024-08-13 00:55:27,666 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 16 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 00:55:41,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=1912690.0, ans=0.95 2024-08-13 00:56:01,091 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 00:56:04,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1912890.0, ans=0.125 2024-08-13 00:56:04,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1912890.0, ans=0.1 2024-08-13 00:56:09,440 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 00:56:20,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2900, loss[loss=0.09758, beats_loss=0.011, ecapa_loss=0.0002044, whisper_loss=0.08454, over 20951.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.0001673, whisper_loss=0.09185, over 3850572.02 frames. ], batch size: 90, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:56:36,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1913090.0, ans=0.0 2024-08-13 00:56:38,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1913090.0, ans=0.2 2024-08-13 00:56:55,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-08-13 00:57:18,751 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.481e+01 2.818e+01 3.186e+01 4.138e+01, threshold=5.637e+01, percent-clipped=0.0 2024-08-13 00:57:24,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=15.0 2024-08-13 00:57:32,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=12.0 2024-08-13 00:57:37,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1913390.0, ans=0.0 2024-08-13 00:57:41,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2024-08-13 00:57:45,365 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 2950, loss[loss=0.06883, beats_loss=0.0132, ecapa_loss=0.0001414, whisper_loss=0.05422, over 15231.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.000169, whisper_loss=0.09124, over 3813292.32 frames. ], batch size: 62, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:57:53,840 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.933e-02 2024-08-13 00:57:58,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-13 00:58:46,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2024-08-13 00:58:49,100 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-13 00:58:49,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1913890.0, ans=0.125 2024-08-13 00:58:50,536 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 00:58:53,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=1913890.0, ans=0.95 2024-08-13 00:59:00,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1913890.0, ans=0.125 2024-08-13 00:59:01,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1913990.0, ans=0.09899494936611666 2024-08-13 00:59:02,719 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3000, loss[loss=0.09746, beats_loss=0.01172, ecapa_loss=0.0001732, whisper_loss=0.08401, over 22186.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001687, whisper_loss=0.09154, over 3856625.92 frames. ], batch size: 93, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 00:59:02,720 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 00:59:43,453 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005759, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 01:00:02,164 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on SV_voxceleb1: loss=0.004628, beats_loss=0, ecapa_loss=0.0004628, whisper_loss=0, over 939242.00 frames. 2024-08-13 01:01:59,755 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on AT_audioset: loss=0.02407, beats_loss=0.02407, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 01:01:59,759 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 01:02:19,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1914090.0, ans=0.0 2024-08-13 01:02:27,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.23 vs. limit=6.0 2024-08-13 01:02:35,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1914190.0, ans=0.125 2024-08-13 01:02:37,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1914190.0, ans=0.1 2024-08-13 01:02:42,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2024-08-13 01:02:50,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+01 2.523e+01 2.729e+01 3.233e+01 5.051e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-13 01:02:54,723 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 01:03:04,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1914390.0, ans=0.1 2024-08-13 01:03:05,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1914390.0, ans=0.125 2024-08-13 01:03:16,483 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3050, loss[loss=0.09475, beats_loss=0.01347, ecapa_loss=0.0001171, whisper_loss=0.08011, over 22021.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01098, ecapa_loss=0.0001676, whisper_loss=0.09135, over 3881505.23 frames. ], batch size: 87, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:03:44,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1914690.0, ans=0.125 2024-08-13 01:03:50,083 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 01:03:57,023 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-13 01:03:59,675 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 01:04:07,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-08-13 01:04:10,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1914790.0, ans=0.125 2024-08-13 01:04:14,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=15.74 vs. limit=15.0 2024-08-13 01:04:27,347 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 01:04:29,108 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 01:04:30,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3100, loss[loss=0.1167, beats_loss=0.009559, ecapa_loss=0.000157, whisper_loss=0.1056, over 22685.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.0001676, whisper_loss=0.0917, over 3880011.42 frames. ], batch size: 89, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:04:30,444 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 01:04:34,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1914990.0, ans=0.125 2024-08-13 01:04:47,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1915090.0, ans=0.2 2024-08-13 01:04:54,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1915090.0, ans=0.2 2024-08-13 01:04:58,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1915190.0, ans=0.125 2024-08-13 01:05:06,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1915190.0, ans=0.0 2024-08-13 01:05:18,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.430e+01 2.726e+01 3.080e+01 5.396e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 01:05:33,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1915390.0, ans=0.0 2024-08-13 01:05:34,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1915390.0, ans=0.2 2024-08-13 01:05:36,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-13 01:05:37,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1915390.0, ans=0.125 2024-08-13 01:05:44,505 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3150, loss[loss=0.08983, beats_loss=0.01087, ecapa_loss=0.0002198, whisper_loss=0.07676, over 13717.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01098, ecapa_loss=0.0001676, whisper_loss=0.09174, over 3861596.89 frames. ], batch size: 56, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:05:50,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-13 01:06:38,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1915790.0, ans=0.0 2024-08-13 01:06:43,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1915890.0, ans=0.0 2024-08-13 01:06:44,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1915890.0, ans=0.125 2024-08-13 01:06:45,518 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 28 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 01:06:51,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1915890.0, ans=0.2 2024-08-13 01:06:55,378 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 01:06:58,410 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3200, loss[loss=0.1023, beats_loss=0.00971, ecapa_loss=0.0001906, whisper_loss=0.09072, over 19838.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01084, ecapa_loss=0.0001684, whisper_loss=0.09272, over 3870958.81 frames. ], batch size: 81, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:07:20,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1916090.0, ans=0.125 2024-08-13 01:07:26,520 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 31 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 01:07:43,425 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 01:07:45,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.363e+01 2.691e+01 2.946e+01 6.786e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 01:07:49,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1916290.0, ans=0.125 2024-08-13 01:07:50,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1916290.0, ans=0.2 2024-08-13 01:08:06,937 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 01:08:10,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3250, loss[loss=0.09808, beats_loss=0.01344, ecapa_loss=0.0001359, whisper_loss=0.08327, over 22897.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01079, ecapa_loss=0.0001698, whisper_loss=0.09283, over 3883219.08 frames. ], batch size: 92, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:08:12,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-13 01:08:13,798 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 13 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-13 01:08:16,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1916490.0, ans=0.1 2024-08-13 01:08:20,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1916490.0, ans=0.0 2024-08-13 01:08:29,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1916590.0, ans=0.125 2024-08-13 01:09:00,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1916790.0, ans=0.125 2024-08-13 01:09:06,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1916790.0, ans=0.0 2024-08-13 01:09:25,371 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3300, loss[loss=0.09633, beats_loss=0.01096, ecapa_loss=0.0001785, whisper_loss=0.08359, over 18784.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01086, ecapa_loss=0.0001703, whisper_loss=0.09206, over 3858784.78 frames. ], batch size: 75, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:09:45,361 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 01:09:50,607 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 01:09:55,294 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:10:01,293 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 01:10:06,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-13 01:10:13,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.401e+01 2.681e+01 3.036e+01 4.663e+01, threshold=5.362e+01, percent-clipped=0.0 2024-08-13 01:10:29,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-08-13 01:10:38,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3350, loss[loss=0.1233, beats_loss=0.0104, ecapa_loss=0.0001595, whisper_loss=0.1113, over 22420.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001704, whisper_loss=0.09183, over 3861721.19 frames. ], batch size: 88, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:10:53,056 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 01:10:56,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1917590.0, ans=0.125 2024-08-13 01:10:59,567 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 01:11:03,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1917590.0, ans=0.1 2024-08-13 01:11:16,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2024-08-13 01:11:38,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1917790.0, ans=0.125 2024-08-13 01:11:42,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1917890.0, ans=0.0 2024-08-13 01:11:55,989 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3400, loss[loss=0.09805, beats_loss=0.01376, ecapa_loss=0.0001337, whisper_loss=0.08295, over 23239.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01099, ecapa_loss=0.0001684, whisper_loss=0.09179, over 3896604.47 frames. ], batch size: 89, lr: 4.54e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:12:11,688 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:12:11,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1918090.0, ans=0.125 2024-08-13 01:12:45,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.443e+01 2.703e+01 3.105e+01 5.409e+01, threshold=5.407e+01, percent-clipped=1.0 2024-08-13 01:12:50,047 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 01:13:02,095 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-13 01:13:05,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1918390.0, ans=0.125 2024-08-13 01:13:06,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1918390.0, ans=0.125 2024-08-13 01:13:06,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1918390.0, ans=0.0 2024-08-13 01:13:10,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3450, loss[loss=0.09927, beats_loss=0.01165, ecapa_loss=0.000209, whisper_loss=0.08554, over 18653.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01092, ecapa_loss=0.0001696, whisper_loss=0.09141, over 3877319.86 frames. ], batch size: 77, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:13:11,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1918490.0, ans=0.0 2024-08-13 01:13:12,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1918490.0, ans=0.0 2024-08-13 01:13:15,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1918490.0, ans=0.05 2024-08-13 01:13:20,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1918490.0, ans=0.125 2024-08-13 01:13:36,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1918590.0, ans=0.125 2024-08-13 01:13:37,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1918690.0, ans=0.125 2024-08-13 01:13:39,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1918690.0, ans=10.0 2024-08-13 01:14:05,497 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 01:14:20,062 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3500, loss[loss=0.1018, beats_loss=0.0111, ecapa_loss=0.0001747, whisper_loss=0.08893, over 21287.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001704, whisper_loss=0.09185, over 3889373.51 frames. ], batch size: 88, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:14:34,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1919090.0, ans=0.0 2024-08-13 01:14:34,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1919090.0, ans=0.125 2024-08-13 01:14:41,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1919090.0, ans=0.125 2024-08-13 01:14:41,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-08-13 01:14:48,062 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 01:14:48,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1919190.0, ans=0.125 2024-08-13 01:15:05,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2024-08-13 01:15:05,870 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.466e+01 2.782e+01 3.112e+01 6.873e+01, threshold=5.565e+01, percent-clipped=2.0 2024-08-13 01:15:06,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1919290.0, ans=0.125 2024-08-13 01:15:14,267 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-13 01:15:15,586 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 01:15:29,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3550, loss[loss=0.1015, beats_loss=0.01222, ecapa_loss=0.0001448, whisper_loss=0.08784, over 18777.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01094, ecapa_loss=0.0001686, whisper_loss=0.09087, over 3857887.76 frames. ], batch size: 73, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:15:33,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1919490.0, ans=0.0 2024-08-13 01:15:42,686 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-13 01:15:44,146 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-13 01:15:45,582 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 01:15:56,868 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 01:16:01,180 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 01:16:16,998 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 01:16:22,609 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-13 01:16:24,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-08-13 01:16:40,823 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-192000.pt 2024-08-13 01:16:43,712 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3600, loss[loss=0.09717, beats_loss=0.01345, ecapa_loss=0.0001277, whisper_loss=0.08244, over 21397.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001693, whisper_loss=0.09112, over 3860726.77 frames. ], batch size: 83, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:17:02,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1920090.0, ans=0.2 2024-08-13 01:17:19,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1920190.0, ans=0.1 2024-08-13 01:17:30,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.424e+01 2.680e+01 3.106e+01 1.010e+02, threshold=5.360e+01, percent-clipped=5.0 2024-08-13 01:17:30,266 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 01:17:32,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1920290.0, ans=0.125 2024-08-13 01:17:43,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2024-08-13 01:17:44,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1920390.0, ans=15.0 2024-08-13 01:17:48,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1920390.0, ans=0.0 2024-08-13 01:17:53,655 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3650, loss[loss=0.1117, beats_loss=0.009311, ecapa_loss=0.0001737, whisper_loss=0.1006, over 15885.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0109, ecapa_loss=0.0001701, whisper_loss=0.0913, over 3835556.95 frames. ], batch size: 62, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:17:56,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1920490.0, ans=22.5 2024-08-13 01:18:22,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1920690.0, ans=0.125 2024-08-13 01:18:26,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-13 01:18:37,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=1920790.0, ans=0.05 2024-08-13 01:18:40,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1920790.0, ans=0.0 2024-08-13 01:18:52,800 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 01:19:01,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2024-08-13 01:19:03,626 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3700, loss[loss=0.09995, beats_loss=0.009299, ecapa_loss=0.0001702, whisper_loss=0.08895, over 15786.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001708, whisper_loss=0.09183, over 3835622.76 frames. ], batch size: 62, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:19:31,938 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 01:19:36,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-13 01:19:49,838 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.425e+01 2.811e+01 3.262e+01 7.758e+01, threshold=5.621e+01, percent-clipped=2.0 2024-08-13 01:20:13,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3750, loss[loss=0.1127, beats_loss=0.01113, ecapa_loss=0.0001859, whisper_loss=0.09968, over 18563.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01089, ecapa_loss=0.0001704, whisper_loss=0.09175, over 3848711.51 frames. ], batch size: 77, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:20:14,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1921490.0, ans=0.125 2024-08-13 01:20:36,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-08-13 01:21:00,162 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 01:21:05,625 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 32 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 01:21:08,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1921890.0, ans=0.125 2024-08-13 01:21:09,563 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 01:21:23,441 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3800, loss[loss=0.1026, beats_loss=0.008748, ecapa_loss=0.0001372, whisper_loss=0.09252, over 15348.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01095, ecapa_loss=0.0001704, whisper_loss=0.09165, over 3863612.36 frames. ], batch size: 58, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:21:25,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1921990.0, ans=0.125 2024-08-13 01:21:59,384 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 01:22:08,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.489e+01 2.785e+01 3.114e+01 6.895e+01, threshold=5.569e+01, percent-clipped=1.0 2024-08-13 01:22:14,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1922290.0, ans=0.0 2024-08-13 01:22:23,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1922390.0, ans=0.0 2024-08-13 01:22:28,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1922390.0, ans=0.0 2024-08-13 01:22:32,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3850, loss[loss=0.1025, beats_loss=0.01022, ecapa_loss=0.000163, whisper_loss=0.09061, over 18535.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01096, ecapa_loss=0.00017, whisper_loss=0.0916, over 3892629.25 frames. ], batch size: 71, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:22:33,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1922490.0, ans=10.0 2024-08-13 01:22:34,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1922490.0, ans=0.125 2024-08-13 01:22:41,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1922490.0, ans=0.125 2024-08-13 01:22:47,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1922590.0, ans=0.5 2024-08-13 01:23:02,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1922690.0, ans=0.0 2024-08-13 01:23:09,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1922690.0, ans=0.125 2024-08-13 01:23:29,353 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 01:23:35,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-13 01:23:42,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3900, loss[loss=0.0946, beats_loss=0.01196, ecapa_loss=0.0001669, whisper_loss=0.08098, over 21504.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01098, ecapa_loss=0.0001701, whisper_loss=0.09199, over 3886962.49 frames. ], batch size: 92, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:23:45,745 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 01:23:48,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1922990.0, ans=0.125 2024-08-13 01:24:20,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1923190.0, ans=0.1 2024-08-13 01:24:28,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.595e+01 2.867e+01 3.243e+01 6.009e+01, threshold=5.735e+01, percent-clipped=1.0 2024-08-13 01:24:37,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1923390.0, ans=0.0 2024-08-13 01:24:43,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1923390.0, ans=0.95 2024-08-13 01:24:46,846 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 01:24:50,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2024-08-13 01:24:52,313 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 3950, loss[loss=0.1124, beats_loss=0.01148, ecapa_loss=0.0001606, whisper_loss=0.09928, over 23170.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01088, ecapa_loss=0.0001715, whisper_loss=0.09248, over 3909170.28 frames. ], batch size: 92, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:24:54,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1923490.0, ans=0.0 2024-08-13 01:25:12,436 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-13 01:25:17,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2024-08-13 01:25:18,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=12.0 2024-08-13 01:25:19,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1923690.0, ans=0.1 2024-08-13 01:25:30,559 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 01:25:41,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1923790.0, ans=0.1 2024-08-13 01:25:43,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1923790.0, ans=0.1 2024-08-13 01:25:50,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.66 vs. limit=10.0 2024-08-13 01:25:52,474 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 01:25:55,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1923890.0, ans=0.125 2024-08-13 01:26:02,306 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4000, loss[loss=0.08973, beats_loss=0.01294, ecapa_loss=0.0001437, whisper_loss=0.07536, over 18608.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0109, ecapa_loss=0.0001711, whisper_loss=0.09238, over 3928156.84 frames. ], batch size: 74, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:26:02,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1923990.0, ans=0.0 2024-08-13 01:26:05,221 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 01:26:07,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2024-08-13 01:26:11,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2024-08-13 01:26:27,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1924090.0, ans=0.125 2024-08-13 01:26:28,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1924190.0, ans=0.125 2024-08-13 01:26:29,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-13 01:26:34,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1924190.0, ans=0.125 2024-08-13 01:26:36,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1924190.0, ans=0.2 2024-08-13 01:26:47,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.537e+01 2.883e+01 3.271e+01 5.034e+01, threshold=5.767e+01, percent-clipped=0.0 2024-08-13 01:26:50,918 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 01:27:12,040 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4050, loss[loss=0.1057, beats_loss=0.01098, ecapa_loss=0.0001469, whisper_loss=0.09329, over 20849.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.000171, whisper_loss=0.09199, over 3916338.28 frames. ], batch size: 81, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:27:27,618 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 01:27:51,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1924690.0, ans=0.125 2024-08-13 01:27:55,268 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 01:27:58,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1924790.0, ans=0.015 2024-08-13 01:28:21,334 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4100, loss[loss=0.1009, beats_loss=0.01003, ecapa_loss=0.0002122, whisper_loss=0.08873, over 21587.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01076, ecapa_loss=0.0001723, whisper_loss=0.09308, over 3924760.78 frames. ], batch size: 91, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:28:21,526 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 01:28:26,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2024-08-13 01:28:47,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1925090.0, ans=0.2 2024-08-13 01:29:03,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-13 01:29:08,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.339e+01 2.647e+01 3.027e+01 3.702e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-13 01:29:14,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1925290.0, ans=0.0 2024-08-13 01:29:32,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4150, loss[loss=0.102, beats_loss=0.0124, ecapa_loss=0.0002052, whisper_loss=0.08758, over 21931.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0108, ecapa_loss=0.0001711, whisper_loss=0.0931, over 3943110.56 frames. ], batch size: 92, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:29:37,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1925490.0, ans=0.125 2024-08-13 01:29:44,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1925490.0, ans=0.2 2024-08-13 01:29:45,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1925590.0, ans=0.1 2024-08-13 01:29:49,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1925590.0, ans=0.125 2024-08-13 01:30:05,472 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 01:30:09,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1925690.0, ans=0.125 2024-08-13 01:30:14,812 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-13 01:30:15,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1925790.0, ans=0.1 2024-08-13 01:30:23,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1925790.0, ans=0.125 2024-08-13 01:30:28,693 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 33 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 01:30:30,323 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 30 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 01:30:34,590 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 01:30:37,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1925890.0, ans=0.125 2024-08-13 01:30:39,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1925890.0, ans=0.2 2024-08-13 01:30:43,366 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4200, loss[loss=0.09851, beats_loss=0.01018, ecapa_loss=0.0002127, whisper_loss=0.0862, over 21164.00 frames. ], tot_loss[loss=0.1061, beats_loss=0.01079, ecapa_loss=0.0001714, whisper_loss=0.09365, over 3939809.66 frames. ], batch size: 89, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:30:49,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1925990.0, ans=0.0 2024-08-13 01:31:20,713 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 01:31:28,520 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.387e+01 2.732e+01 2.995e+01 7.981e+01, threshold=5.463e+01, percent-clipped=1.0 2024-08-13 01:31:28,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1926290.0, ans=0.0 2024-08-13 01:31:40,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1926390.0, ans=0.125 2024-08-13 01:31:52,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4250, loss[loss=0.1173, beats_loss=0.008916, ecapa_loss=0.0001786, whisper_loss=0.1066, over 19329.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01078, ecapa_loss=0.0001706, whisper_loss=0.09349, over 3937948.22 frames. ], batch size: 77, lr: 4.53e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:31:54,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1926490.0, ans=0.2 2024-08-13 01:32:26,453 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 01:32:38,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2024-08-13 01:32:40,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1926790.0, ans=0.07 2024-08-13 01:32:44,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-08-13 01:32:47,432 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 01:32:47,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1926890.0, ans=0.125 2024-08-13 01:32:57,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1926890.0, ans=0.0 2024-08-13 01:33:02,217 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4300, loss[loss=0.08991, beats_loss=0.00852, ecapa_loss=0.0001878, whisper_loss=0.07952, over 16057.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01075, ecapa_loss=0.0001711, whisper_loss=0.09299, over 3897036.48 frames. ], batch size: 64, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:33:05,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1926990.0, ans=0.0 2024-08-13 01:33:05,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1926990.0, ans=0.125 2024-08-13 01:33:08,007 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 01:33:19,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1927090.0, ans=0.125 2024-08-13 01:33:29,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1927190.0, ans=0.125 2024-08-13 01:33:39,948 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 01:33:41,403 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 01:33:47,215 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 01:33:48,272 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.403e+01 2.611e+01 3.081e+01 4.718e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-13 01:34:11,438 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4350, loss[loss=0.08384, beats_loss=0.009912, ecapa_loss=0.0001803, whisper_loss=0.07212, over 14486.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01072, ecapa_loss=0.0001711, whisper_loss=0.09308, over 3895131.68 frames. ], batch size: 59, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:34:20,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-13 01:34:22,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1927490.0, ans=0.0 2024-08-13 01:34:35,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1927590.0, ans=0.125 2024-08-13 01:34:50,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1927690.0, ans=0.0 2024-08-13 01:34:51,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1927690.0, ans=0.0 2024-08-13 01:35:03,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1927790.0, ans=0.0 2024-08-13 01:35:21,239 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4400, loss[loss=0.1095, beats_loss=0.01066, ecapa_loss=0.000207, whisper_loss=0.09674, over 15453.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01076, ecapa_loss=0.0001717, whisper_loss=0.0927, over 3883943.34 frames. ], batch size: 62, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:35:32,256 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 01:35:33,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1928090.0, ans=0.2 2024-08-13 01:35:47,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 01:36:06,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.420e+01 2.637e+01 3.058e+01 4.603e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 01:36:07,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1928290.0, ans=0.0 2024-08-13 01:36:08,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1928290.0, ans=0.1 2024-08-13 01:36:13,882 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 01:36:14,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-08-13 01:36:15,286 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 28 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 01:36:17,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.62 vs. limit=12.0 2024-08-13 01:36:18,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.29 vs. limit=10.0 2024-08-13 01:36:30,551 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4450, loss[loss=0.1228, beats_loss=0.00858, ecapa_loss=0.0001838, whisper_loss=0.1124, over 22638.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01069, ecapa_loss=0.0001714, whisper_loss=0.09292, over 3853893.12 frames. ], batch size: 90, lr: 4.52e-03, grad_scale: 1.152921504606847e+18 2024-08-13 01:36:51,903 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 01:37:01,569 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 01:37:05,542 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 01:37:28,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1928890.0, ans=0.0 2024-08-13 01:37:34,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1928890.0, ans=0.1 2024-08-13 01:37:39,857 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4500, loss[loss=0.09002, beats_loss=0.00817, ecapa_loss=0.0002148, whisper_loss=0.0797, over 15862.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01083, ecapa_loss=0.0001699, whisper_loss=0.09248, over 3857483.26 frames. ], batch size: 63, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:37:42,998 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-13 01:37:46,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-13 01:38:00,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-08-13 01:38:00,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2024-08-13 01:38:11,202 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 01:38:27,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.366e+01 2.717e+01 3.132e+01 4.916e+01, threshold=5.434e+01, percent-clipped=0.0 2024-08-13 01:38:29,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1929290.0, ans=0.125 2024-08-13 01:38:33,221 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 01:38:35,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2024-08-13 01:38:49,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4550, loss[loss=0.1078, beats_loss=0.01136, ecapa_loss=0.0001741, whisper_loss=0.09467, over 22156.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001704, whisper_loss=0.09209, over 3849627.40 frames. ], batch size: 88, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:38:51,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1929490.0, ans=0.125 2024-08-13 01:38:51,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.73 vs. limit=22.5 2024-08-13 01:38:53,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1929490.0, ans=0.125 2024-08-13 01:38:55,686 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-13 01:39:01,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1929490.0, ans=0.125 2024-08-13 01:39:18,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1929690.0, ans=0.0 2024-08-13 01:39:37,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1929790.0, ans=0.125 2024-08-13 01:39:38,965 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 25 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 01:39:44,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1929890.0, ans=0.125 2024-08-13 01:39:59,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4600, loss[loss=0.08904, beats_loss=0.01301, ecapa_loss=0.0001333, whisper_loss=0.0747, over 18393.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001701, whisper_loss=0.09198, over 3815322.33 frames. ], batch size: 73, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:40:01,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1929990.0, ans=0.0 2024-08-13 01:40:02,928 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 19 from LS+wenet, 25 from Vox, 50 fro AS 2024-08-13 01:40:04,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1929990.0, ans=0.0 2024-08-13 01:40:10,921 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 01:40:14,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.62 vs. limit=10.0 2024-08-13 01:40:21,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1930090.0, ans=0.0 2024-08-13 01:40:41,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1930290.0, ans=0.0 2024-08-13 01:40:46,088 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.515e+01 2.755e+01 3.045e+01 4.770e+01, threshold=5.510e+01, percent-clipped=0.0 2024-08-13 01:41:07,988 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4650, loss[loss=0.1093, beats_loss=0.01134, ecapa_loss=0.0001496, whisper_loss=0.09651, over 16183.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001706, whisper_loss=0.09101, over 3828657.06 frames. ], batch size: 62, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:41:09,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1930490.0, ans=0.125 2024-08-13 01:41:11,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1930490.0, ans=0.1 2024-08-13 01:41:16,760 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 01:41:17,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1930490.0, ans=0.0 2024-08-13 01:41:20,774 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 01:41:28,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1930590.0, ans=0.1 2024-08-13 01:41:32,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1930590.0, ans=0.125 2024-08-13 01:41:34,732 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 01:41:36,063 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-13 01:41:56,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1930790.0, ans=0.125 2024-08-13 01:42:06,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1930890.0, ans=0.0 2024-08-13 01:42:16,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4700, loss[loss=0.1072, beats_loss=0.01195, ecapa_loss=0.0001705, whisper_loss=0.09355, over 20020.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001709, whisper_loss=0.09139, over 3834115.81 frames. ], batch size: 78, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:42:19,935 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-13 01:42:20,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1930990.0, ans=0.125 2024-08-13 01:42:24,043 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 01:42:29,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1931090.0, ans=0.125 2024-08-13 01:42:33,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1931090.0, ans=0.0 2024-08-13 01:42:39,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1931090.0, ans=0.125 2024-08-13 01:42:41,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2024-08-13 01:42:48,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1931190.0, ans=0.2 2024-08-13 01:42:54,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1931190.0, ans=0.125 2024-08-13 01:43:03,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.542e+01 2.823e+01 3.098e+01 3.628e+02, threshold=5.646e+01, percent-clipped=2.0 2024-08-13 01:43:06,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1931290.0, ans=0.1 2024-08-13 01:43:23,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1931390.0, ans=0.125 2024-08-13 01:43:26,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4750, loss[loss=0.1011, beats_loss=0.01098, ecapa_loss=0.0002501, whisper_loss=0.08766, over 21182.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001716, whisper_loss=0.09149, over 3838324.97 frames. ], batch size: 91, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:43:41,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1931590.0, ans=0.2 2024-08-13 01:43:52,403 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 17 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-13 01:44:00,464 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 29 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-13 01:44:06,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1931690.0, ans=0.125 2024-08-13 01:44:18,993 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 12 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-13 01:44:23,271 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 01:44:24,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1931890.0, ans=0.0 2024-08-13 01:44:42,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4800, loss[loss=0.1224, beats_loss=0.01045, ecapa_loss=0.0001841, whisper_loss=0.1101, over 22409.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001729, whisper_loss=0.09073, over 3872295.89 frames. ], batch size: 91, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:44:54,355 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.83 vs. limit=15.0 2024-08-13 01:44:58,931 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 01:45:07,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1932090.0, ans=0.1 2024-08-13 01:45:23,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1932190.0, ans=0.05 2024-08-13 01:45:26,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1932190.0, ans=0.2 2024-08-13 01:45:27,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1932190.0, ans=0.125 2024-08-13 01:45:32,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=22.5 2024-08-13 01:45:49,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.506e+01 2.786e+01 3.078e+01 4.876e+01, threshold=5.572e+01, percent-clipped=0.0 2024-08-13 01:45:52,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1932290.0, ans=0.0 2024-08-13 01:45:55,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-13 01:46:08,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-13 01:46:16,795 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 01:46:21,810 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4850, loss[loss=0.09079, beats_loss=0.011, ecapa_loss=0.0001736, whisper_loss=0.07805, over 21404.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01093, ecapa_loss=0.000173, whisper_loss=0.09022, over 3867378.05 frames. ], batch size: 86, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:46:24,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1932490.0, ans=0.1 2024-08-13 01:46:48,676 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 01:47:02,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1932590.0, ans=0.0 2024-08-13 01:47:04,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1932690.0, ans=0.0 2024-08-13 01:47:15,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1932690.0, ans=0.1 2024-08-13 01:47:39,991 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 01:47:43,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1932790.0, ans=0.2 2024-08-13 01:47:54,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1932890.0, ans=0.0 2024-08-13 01:48:11,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4900, loss[loss=0.09271, beats_loss=0.01197, ecapa_loss=0.0001333, whisper_loss=0.0794, over 22659.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01094, ecapa_loss=0.0001713, whisper_loss=0.09024, over 3862455.72 frames. ], batch size: 88, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:48:46,103 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 10 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 01:48:48,177 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 01:49:05,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1933190.0, ans=0.125 2024-08-13 01:49:20,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1933290.0, ans=0.125 2024-08-13 01:49:25,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1933290.0, ans=0.1 2024-08-13 01:49:27,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1933290.0, ans=0.1 2024-08-13 01:49:32,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.462e+01 2.765e+01 3.056e+01 4.985e+01, threshold=5.531e+01, percent-clipped=0.0 2024-08-13 01:49:48,799 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 01:50:03,708 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 4950, loss[loss=0.09893, beats_loss=0.01264, ecapa_loss=0.00018, whisper_loss=0.08449, over 21458.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01103, ecapa_loss=0.00017, whisper_loss=0.09033, over 3863882.79 frames. ], batch size: 91, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:50:35,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1933690.0, ans=0.07 2024-08-13 01:50:45,253 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 38 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 01:51:00,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1933790.0, ans=0.015 2024-08-13 01:51:21,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5000, loss[loss=0.1136, beats_loss=0.01094, ecapa_loss=0.0001741, whisper_loss=0.101, over 17182.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01098, ecapa_loss=0.0001702, whisper_loss=0.09077, over 3894304.61 frames. ], batch size: 66, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:51:24,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1933990.0, ans=0.0 2024-08-13 01:51:42,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1934090.0, ans=0.125 2024-08-13 01:51:42,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=12.0 2024-08-13 01:51:57,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1934190.0, ans=0.125 2024-08-13 01:52:02,931 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 01:52:09,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1934290.0, ans=0.125 2024-08-13 01:52:13,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.373e+01 2.737e+01 3.184e+01 6.268e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 01:52:39,238 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5050, loss[loss=0.09762, beats_loss=0.01105, ecapa_loss=0.0001358, whisper_loss=0.08521, over 21201.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01106, ecapa_loss=0.0001699, whisper_loss=0.09096, over 3899678.38 frames. ], batch size: 81, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:52:48,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1934490.0, ans=0.0 2024-08-13 01:52:56,438 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 01:52:58,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2024-08-13 01:53:09,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1934590.0, ans=0.125 2024-08-13 01:53:40,158 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 01:53:55,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1934890.0, ans=0.04949747468305833 2024-08-13 01:54:00,068 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5100, loss[loss=0.111, beats_loss=0.009012, ecapa_loss=0.0001735, whisper_loss=0.1003, over 19661.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01104, ecapa_loss=0.000169, whisper_loss=0.09167, over 3911480.26 frames. ], batch size: 77, lr: 4.52e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:54:06,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-13 01:54:07,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2024-08-13 01:54:23,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1935090.0, ans=0.125 2024-08-13 01:54:26,664 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 01:54:38,916 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 01:54:43,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1935190.0, ans=0.125 2024-08-13 01:54:45,380 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 01:54:49,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1935290.0, ans=0.1 2024-08-13 01:54:56,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.475e+01 2.679e+01 3.018e+01 4.914e+01, threshold=5.357e+01, percent-clipped=0.0 2024-08-13 01:55:00,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1935290.0, ans=0.0 2024-08-13 01:55:11,867 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 01:55:12,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-08-13 01:55:22,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5150, loss[loss=0.1214, beats_loss=0.01008, ecapa_loss=0.0001352, whisper_loss=0.1099, over 23405.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01095, ecapa_loss=0.0001681, whisper_loss=0.09251, over 3917429.01 frames. ], batch size: 89, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:55:50,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-13 01:55:56,113 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 01:56:11,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1935790.0, ans=0.125 2024-08-13 01:56:25,854 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 30 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-13 01:56:26,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=12.0 2024-08-13 01:56:27,158 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 01:56:47,728 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5200, loss[loss=0.09524, beats_loss=0.00957, ecapa_loss=0.000202, whisper_loss=0.08365, over 17708.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01088, ecapa_loss=0.0001687, whisper_loss=0.09198, over 3918404.70 frames. ], batch size: 73, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:57:05,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1936090.0, ans=0.0 2024-08-13 01:57:05,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1936090.0, ans=0.1 2024-08-13 01:57:08,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1936090.0, ans=0.1 2024-08-13 01:57:11,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1936090.0, ans=0.2 2024-08-13 01:57:12,612 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 01:57:28,699 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 01:57:42,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.433e+01 2.676e+01 3.023e+01 1.012e+02, threshold=5.352e+01, percent-clipped=2.0 2024-08-13 01:57:43,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1936290.0, ans=0.125 2024-08-13 01:58:08,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5250, loss[loss=0.09251, beats_loss=0.01458, ecapa_loss=0.0001281, whisper_loss=0.07665, over 23939.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001681, whisper_loss=0.092, over 3920356.64 frames. ], batch size: 95, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:58:18,920 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 01:58:20,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1936490.0, ans=0.125 2024-08-13 01:58:21,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1936490.0, ans=0.125 2024-08-13 01:58:25,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1936590.0, ans=0.1 2024-08-13 01:58:40,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1936690.0, ans=0.125 2024-08-13 01:58:40,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1936690.0, ans=0.125 2024-08-13 01:58:41,534 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-13 01:58:44,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=1936690.0, ans=0.5 2024-08-13 01:58:52,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1936690.0, ans=0.2 2024-08-13 01:58:58,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1936790.0, ans=0.2 2024-08-13 01:59:21,901 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 24 from LS+wenet, 33 from Vox, 39 fro AS 2024-08-13 01:59:24,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2024-08-13 01:59:30,448 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5300, loss[loss=0.1178, beats_loss=0.0105, ecapa_loss=0.0001549, whisper_loss=0.1057, over 20974.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.0001701, whisper_loss=0.0917, over 3891679.45 frames. ], batch size: 82, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 01:59:57,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1937090.0, ans=0.125 2024-08-13 02:00:03,829 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 02:00:05,205 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 02:00:12,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1937190.0, ans=0.0 2024-08-13 02:00:25,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.483e+01 2.816e+01 3.213e+01 1.142e+02, threshold=5.632e+01, percent-clipped=3.0 2024-08-13 02:00:32,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1937290.0, ans=0.125 2024-08-13 02:00:38,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1937390.0, ans=0.125 2024-08-13 02:00:41,867 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 02:00:43,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1937390.0, ans=0.125 2024-08-13 02:00:47,666 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 02:00:51,029 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5350, loss[loss=0.1001, beats_loss=0.01039, ecapa_loss=0.0001707, whisper_loss=0.08799, over 22512.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001702, whisper_loss=0.09216, over 3894480.76 frames. ], batch size: 91, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:01:10,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1937590.0, ans=0.125 2024-08-13 02:01:18,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-13 02:01:58,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-08-13 02:02:05,917 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-13 02:02:13,618 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5400, loss[loss=0.1331, beats_loss=0.01058, ecapa_loss=0.0001235, whisper_loss=0.1213, over 24736.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01088, ecapa_loss=0.0001676, whisper_loss=0.09201, over 3891674.54 frames. ], batch size: 91, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:02:15,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1937990.0, ans=0.125 2024-08-13 02:02:17,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1937990.0, ans=0.125 2024-08-13 02:02:38,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1938090.0, ans=0.125 2024-08-13 02:02:44,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1938090.0, ans=0.125 2024-08-13 02:02:44,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-08-13 02:02:52,886 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 30 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-13 02:02:59,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1938190.0, ans=0.0 2024-08-13 02:03:08,526 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 02:03:09,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.492e+01 2.751e+01 3.252e+01 5.304e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-13 02:03:10,333 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 02:03:15,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=15.0 2024-08-13 02:03:22,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1938390.0, ans=0.1 2024-08-13 02:03:29,118 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 02:03:34,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1938390.0, ans=0.125 2024-08-13 02:03:37,133 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5450, loss[loss=0.101, beats_loss=0.0101, ecapa_loss=0.0001774, whisper_loss=0.08917, over 22995.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0109, ecapa_loss=0.0001677, whisper_loss=0.09143, over 3894284.83 frames. ], batch size: 91, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:03:46,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1938490.0, ans=10.0 2024-08-13 02:03:47,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1938490.0, ans=0.0 2024-08-13 02:04:20,359 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 02:04:25,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=1938790.0, ans=10.0 2024-08-13 02:04:26,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1938790.0, ans=0.125 2024-08-13 02:04:29,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1938790.0, ans=0.0 2024-08-13 02:04:40,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1938790.0, ans=0.125 2024-08-13 02:04:59,450 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5500, loss[loss=0.08175, beats_loss=0.01162, ecapa_loss=0.0001704, whisper_loss=0.06843, over 14534.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01092, ecapa_loss=0.0001676, whisper_loss=0.09179, over 3940123.87 frames. ], batch size: 59, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:05:03,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1938990.0, ans=0.1 2024-08-13 02:05:45,019 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 02:05:45,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-08-13 02:05:49,652 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 02:05:52,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.471e+01 2.738e+01 3.080e+01 7.605e+01, threshold=5.476e+01, percent-clipped=2.0 2024-08-13 02:06:10,461 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 33 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 02:06:18,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5550, loss[loss=0.1044, beats_loss=0.01193, ecapa_loss=0.0001892, whisper_loss=0.09055, over 22230.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01091, ecapa_loss=0.0001681, whisper_loss=0.09215, over 3938157.58 frames. ], batch size: 93, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:06:19,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2024-08-13 02:06:36,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1939590.0, ans=0.125 2024-08-13 02:06:36,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-13 02:06:41,302 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 02:06:43,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1939590.0, ans=0.125 2024-08-13 02:07:03,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1939690.0, ans=0.125 2024-08-13 02:07:09,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1939790.0, ans=15.0 2024-08-13 02:07:12,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-08-13 02:07:14,951 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 02:07:37,477 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 02:07:38,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=12.0 2024-08-13 02:07:38,813 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5600, loss[loss=0.0944, beats_loss=0.01286, ecapa_loss=0.0001564, whisper_loss=0.07997, over 19408.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.011, ecapa_loss=0.0001669, whisper_loss=0.0914, over 3942461.42 frames. ], batch size: 79, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:07:39,048 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 02:07:42,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1939990.0, ans=0.0 2024-08-13 02:07:43,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1939990.0, ans=15.0 2024-08-13 02:08:09,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1940090.0, ans=0.125 2024-08-13 02:08:17,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=12.0 2024-08-13 02:08:19,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2024-08-13 02:08:32,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1940290.0, ans=0.125 2024-08-13 02:08:35,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.482e+01 2.705e+01 3.003e+01 6.205e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 02:08:37,446 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 02:08:40,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1940290.0, ans=0.1 2024-08-13 02:08:47,859 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 20 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-13 02:08:52,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1940390.0, ans=0.125 2024-08-13 02:09:01,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5650, loss[loss=0.126, beats_loss=0.0121, ecapa_loss=0.0001655, whisper_loss=0.1122, over 22115.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001674, whisper_loss=0.09101, over 3941099.31 frames. ], batch size: 88, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:09:06,867 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 02:09:13,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1940490.0, ans=0.125 2024-08-13 02:09:16,950 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 02:09:20,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1940590.0, ans=0.0 2024-08-13 02:09:22,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-13 02:09:32,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1940590.0, ans=0.125 2024-08-13 02:09:40,505 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 02:09:42,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1940690.0, ans=0.0 2024-08-13 02:10:21,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2024-08-13 02:10:22,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5700, loss[loss=0.1136, beats_loss=0.01128, ecapa_loss=0.0001688, whisper_loss=0.1006, over 22545.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01098, ecapa_loss=0.0001687, whisper_loss=0.09151, over 3928486.52 frames. ], batch size: 91, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:10:49,546 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 24 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-13 02:11:16,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.518e+01 2.759e+01 3.173e+01 1.965e+02, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 02:11:18,354 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-13 02:11:28,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1941390.0, ans=0.125 2024-08-13 02:11:34,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-13 02:11:40,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1941490.0, ans=0.0 2024-08-13 02:11:41,413 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5750, loss[loss=0.1125, beats_loss=0.009474, ecapa_loss=0.0001704, whisper_loss=0.1013, over 14550.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001675, whisper_loss=0.09129, over 3883761.88 frames. ], batch size: 58, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:11:43,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1941490.0, ans=0.125 2024-08-13 02:11:47,137 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 02:11:53,330 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 02:12:01,458 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 02:12:14,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1941690.0, ans=0.125 2024-08-13 02:12:15,561 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-13 02:12:32,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1941790.0, ans=0.0 2024-08-13 02:12:47,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-13 02:12:49,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1941890.0, ans=0.125 2024-08-13 02:12:50,939 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 02:13:02,703 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5800, loss[loss=0.0888, beats_loss=0.00879, ecapa_loss=0.0002015, whisper_loss=0.078, over 13290.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01086, ecapa_loss=0.000169, whisper_loss=0.09166, over 3849211.56 frames. ], batch size: 54, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:13:04,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1941990.0, ans=0.1 2024-08-13 02:13:08,742 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 02:13:17,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=22.5 2024-08-13 02:13:23,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-13 02:13:57,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.443e+01 2.748e+01 3.161e+01 4.611e+01, threshold=5.495e+01, percent-clipped=0.0 2024-08-13 02:13:58,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1942290.0, ans=0.125 2024-08-13 02:14:24,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5850, loss[loss=0.1036, beats_loss=0.01226, ecapa_loss=0.0001419, whisper_loss=0.08992, over 21941.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001676, whisper_loss=0.09161, over 3884326.54 frames. ], batch size: 89, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:14:46,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.90 vs. limit=10.0 2024-08-13 02:15:05,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1942690.0, ans=0.1 2024-08-13 02:15:22,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1942790.0, ans=0.0 2024-08-13 02:15:39,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1942890.0, ans=0.125 2024-08-13 02:15:41,191 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 21 from LS+wenet, 30 from Vox, 43 fro AS 2024-08-13 02:15:47,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5900, loss[loss=0.1042, beats_loss=0.009118, ecapa_loss=0.0001899, whisper_loss=0.09316, over 21232.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001683, whisper_loss=0.09161, over 3895079.18 frames. ], batch size: 86, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:16:18,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-08-13 02:16:19,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1943190.0, ans=10.0 2024-08-13 02:16:23,808 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 24 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 02:16:40,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.528e+01 2.790e+01 3.084e+01 1.766e+02, threshold=5.581e+01, percent-clipped=1.0 2024-08-13 02:16:59,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1943390.0, ans=0.125 2024-08-13 02:17:03,391 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 02:17:07,170 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 5950, loss[loss=0.1051, beats_loss=0.01064, ecapa_loss=0.0001907, whisper_loss=0.09256, over 21111.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01103, ecapa_loss=0.0001679, whisper_loss=0.09068, over 3870316.91 frames. ], batch size: 87, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:17:19,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1943490.0, ans=0.0 2024-08-13 02:17:29,290 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 02:17:52,119 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 02:17:55,390 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 02:17:55,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1943790.0, ans=0.0 2024-08-13 02:17:57,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1943790.0, ans=0.125 2024-08-13 02:18:11,528 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 02:18:28,579 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6000, loss[loss=0.113, beats_loss=0.01232, ecapa_loss=0.0001837, whisper_loss=0.09885, over 21752.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01104, ecapa_loss=0.0001673, whisper_loss=0.0914, over 3883582.42 frames. ], batch size: 92, lr: 4.51e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:18:28,580 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 02:19:07,039 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005835, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 02:19:25,670 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on SV_voxceleb1: loss=0.004586, beats_loss=0, ecapa_loss=0.0004586, whisper_loss=0, over 939242.00 frames. 2024-08-13 02:21:14,514 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on AT_audioset: loss=0.02397, beats_loss=0.02397, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 02:21:14,518 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 02:21:17,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1943990.0, ans=0.0 2024-08-13 02:21:24,392 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 02:22:09,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1944290.0, ans=0.125 2024-08-13 02:22:10,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.472e+01 2.800e+01 3.130e+01 4.518e+01, threshold=5.599e+01, percent-clipped=0.0 2024-08-13 02:22:12,493 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:22:13,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-08-13 02:22:15,609 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 11 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 02:22:33,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1944390.0, ans=0.125 2024-08-13 02:22:35,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6050, loss[loss=0.08772, beats_loss=0.01093, ecapa_loss=0.0001555, whisper_loss=0.07524, over 19496.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01096, ecapa_loss=0.0001673, whisper_loss=0.09202, over 3863798.92 frames. ], batch size: 77, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:22:36,127 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 02:22:43,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1944490.0, ans=0.125 2024-08-13 02:22:43,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1944490.0, ans=0.0 2024-08-13 02:22:51,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1944590.0, ans=0.125 2024-08-13 02:22:54,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1944590.0, ans=0.125 2024-08-13 02:22:57,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1944590.0, ans=0.07 2024-08-13 02:23:00,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1944590.0, ans=0.5 2024-08-13 02:23:11,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=12.0 2024-08-13 02:23:26,822 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 02:23:54,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-13 02:23:58,260 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6100, loss[loss=0.08552, beats_loss=0.01318, ecapa_loss=0.0001785, whisper_loss=0.07055, over 21829.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01095, ecapa_loss=0.0001695, whisper_loss=0.09158, over 3878097.58 frames. ], batch size: 93, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:23:58,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1944990.0, ans=0.125 2024-08-13 02:24:11,380 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 02:24:14,564 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 02:24:23,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1945090.0, ans=0.125 2024-08-13 02:24:32,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1945190.0, ans=0.125 2024-08-13 02:24:44,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1945190.0, ans=0.125 2024-08-13 02:24:52,592 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 02:24:53,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2024-08-13 02:24:53,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.564e+01 2.945e+01 3.314e+01 6.954e+01, threshold=5.890e+01, percent-clipped=1.0 2024-08-13 02:24:57,435 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 02:25:13,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1945390.0, ans=0.1 2024-08-13 02:25:21,091 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6150, loss[loss=0.1157, beats_loss=0.009691, ecapa_loss=0.0001506, whisper_loss=0.1045, over 21261.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01094, ecapa_loss=0.0001701, whisper_loss=0.09161, over 3878260.38 frames. ], batch size: 83, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:25:24,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1945490.0, ans=0.0 2024-08-13 02:25:26,283 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 02:25:27,632 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 02:25:29,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1945490.0, ans=0.125 2024-08-13 02:25:30,596 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-13 02:25:37,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1945590.0, ans=0.125 2024-08-13 02:25:47,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1945590.0, ans=0.125 2024-08-13 02:25:55,168 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 02:26:00,247 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 02:26:12,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1945790.0, ans=0.2 2024-08-13 02:26:18,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1945790.0, ans=0.125 2024-08-13 02:26:21,439 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 02:26:29,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1945890.0, ans=0.125 2024-08-13 02:26:35,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1945890.0, ans=0.125 2024-08-13 02:26:42,212 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6200, loss[loss=0.09851, beats_loss=0.01183, ecapa_loss=0.0001623, whisper_loss=0.08506, over 22603.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.000169, whisper_loss=0.09174, over 3883249.14 frames. ], batch size: 92, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:26:55,252 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 02:27:13,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1946190.0, ans=0.125 2024-08-13 02:27:16,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2024-08-13 02:27:39,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.499e+01 2.801e+01 3.134e+01 4.474e+01, threshold=5.602e+01, percent-clipped=0.0 2024-08-13 02:28:05,149 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6250, loss[loss=0.09411, beats_loss=0.01207, ecapa_loss=0.0001644, whisper_loss=0.08039, over 14242.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001687, whisper_loss=0.09155, over 3876778.96 frames. ], batch size: 56, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:28:18,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1946490.0, ans=0.125 2024-08-13 02:28:20,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1946590.0, ans=0.1 2024-08-13 02:28:25,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1946590.0, ans=0.2 2024-08-13 02:28:31,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=8.0 2024-08-13 02:28:41,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1946690.0, ans=0.125 2024-08-13 02:28:41,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1946690.0, ans=0.125 2024-08-13 02:28:49,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=12.0 2024-08-13 02:28:51,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-13 02:28:57,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2024-08-13 02:29:09,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.59 vs. limit=22.5 2024-08-13 02:29:14,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-08-13 02:29:26,990 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6300, loss[loss=0.08753, beats_loss=0.01021, ecapa_loss=0.0001457, whisper_loss=0.07586, over 13755.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001694, whisper_loss=0.09152, over 3847073.08 frames. ], batch size: 54, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:29:37,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-13 02:29:48,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1947090.0, ans=0.125 2024-08-13 02:30:00,871 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 02:30:01,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1947190.0, ans=0.2 2024-08-13 02:30:07,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1947190.0, ans=0.125 2024-08-13 02:30:12,017 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 02:30:20,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.428e+01 2.719e+01 3.075e+01 5.745e+01, threshold=5.438e+01, percent-clipped=1.0 2024-08-13 02:30:32,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-13 02:30:45,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6350, loss[loss=0.09645, beats_loss=0.008412, ecapa_loss=0.0001702, whisper_loss=0.08633, over 14211.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.0109, ecapa_loss=0.0001699, whisper_loss=0.09156, over 3847254.15 frames. ], batch size: 56, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:30:50,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1947490.0, ans=0.125 2024-08-13 02:30:57,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1947490.0, ans=0.125 2024-08-13 02:31:01,891 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 02:31:31,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1947690.0, ans=0.125 2024-08-13 02:31:38,143 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 02:31:47,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1947790.0, ans=0.125 2024-08-13 02:31:51,456 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 02:31:54,731 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 02:32:07,352 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6400, loss[loss=0.1194, beats_loss=0.01033, ecapa_loss=0.0001596, whisper_loss=0.1074, over 24124.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01087, ecapa_loss=0.0001684, whisper_loss=0.09214, over 3880903.24 frames. ], batch size: 92, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:32:20,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-13 02:32:35,828 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 02:32:36,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1948090.0, ans=0.125 2024-08-13 02:32:37,046 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-13 02:32:38,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1948190.0, ans=0.1 2024-08-13 02:32:46,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1948190.0, ans=0.125 2024-08-13 02:33:04,970 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.410e+01 2.725e+01 3.146e+01 5.039e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 02:33:09,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-13 02:33:10,251 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-13 02:33:20,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1948390.0, ans=0.125 2024-08-13 02:33:31,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6450, loss[loss=0.1195, beats_loss=0.007738, ecapa_loss=0.0002258, whisper_loss=0.1095, over 21273.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01089, ecapa_loss=0.0001686, whisper_loss=0.0927, over 3932666.69 frames. ], batch size: 89, lr: 4.50e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:33:31,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1948490.0, ans=0.05 2024-08-13 02:33:47,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1948590.0, ans=0.0 2024-08-13 02:33:58,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1948590.0, ans=0.125 2024-08-13 02:34:01,867 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 02:34:10,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1948690.0, ans=0.0 2024-08-13 02:34:34,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1948790.0, ans=0.1 2024-08-13 02:34:35,688 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 02:34:46,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1948890.0, ans=0.0 2024-08-13 02:34:55,241 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6500, loss[loss=0.1384, beats_loss=0.006297, ecapa_loss=0.0001696, whisper_loss=0.1304, over 17641.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01082, ecapa_loss=0.0001694, whisper_loss=0.09324, over 3926688.61 frames. ], batch size: 65, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:35:00,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1948990.0, ans=0.0 2024-08-13 02:35:00,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1948990.0, ans=0.0 2024-08-13 02:35:08,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2024-08-13 02:35:26,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1949190.0, ans=0.1 2024-08-13 02:35:28,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1949190.0, ans=0.04949747468305833 2024-08-13 02:35:44,700 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 02:35:48,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1949290.0, ans=0.125 2024-08-13 02:35:51,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.473e+01 2.682e+01 2.925e+01 4.435e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-13 02:36:17,544 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6550, loss[loss=0.1137, beats_loss=0.01216, ecapa_loss=0.0001353, whisper_loss=0.1002, over 22599.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01084, ecapa_loss=0.0001691, whisper_loss=0.09282, over 3923241.46 frames. ], batch size: 87, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:36:23,511 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 02:36:41,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1949590.0, ans=0.0 2024-08-13 02:36:53,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2024-08-13 02:37:26,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1949890.0, ans=0.125 2024-08-13 02:37:41,520 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6600, loss[loss=0.1186, beats_loss=0.007012, ecapa_loss=0.0002108, whisper_loss=0.1095, over 21087.00 frames. ], tot_loss[loss=0.106, beats_loss=0.01079, ecapa_loss=0.0001695, whisper_loss=0.09347, over 3956229.95 frames. ], batch size: 85, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:37:43,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1949990.0, ans=0.125 2024-08-13 02:37:48,319 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 02:38:25,752 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 02:38:28,280 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-13 02:38:37,709 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 02:38:46,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1950190.0, ans=0.125 2024-08-13 02:38:54,447 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 02:39:14,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.510e+01 2.757e+01 3.096e+01 4.067e+01, threshold=5.514e+01, percent-clipped=0.0 2024-08-13 02:39:39,268 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6650, loss[loss=0.125, beats_loss=0.009608, ecapa_loss=0.0001832, whisper_loss=0.1135, over 22648.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01082, ecapa_loss=0.0001692, whisper_loss=0.09288, over 3951937.44 frames. ], batch size: 89, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:39:44,075 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 02:40:21,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2024-08-13 02:40:39,001 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 02:40:42,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-08-13 02:40:45,891 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-13 02:41:05,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2024-08-13 02:41:16,347 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6700, loss[loss=0.115, beats_loss=0.01061, ecapa_loss=0.0001803, whisper_loss=0.1026, over 23463.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01088, ecapa_loss=0.0001696, whisper_loss=0.09218, over 3930862.54 frames. ], batch size: 93, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:41:24,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1950990.0, ans=0.125 2024-08-13 02:41:34,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-13 02:41:35,210 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 02:41:44,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-13 02:42:04,376 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.797e-03 2024-08-13 02:42:12,772 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 02:42:23,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+01 2.594e+01 2.894e+01 3.478e+01 5.381e+01, threshold=5.788e+01, percent-clipped=0.0 2024-08-13 02:42:26,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1951290.0, ans=0.125 2024-08-13 02:43:00,128 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6750, loss[loss=0.1146, beats_loss=0.00975, ecapa_loss=0.0001592, whisper_loss=0.1033, over 17062.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01085, ecapa_loss=0.00017, whisper_loss=0.09229, over 3908625.06 frames. ], batch size: 64, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:43:01,606 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 02:43:25,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1951590.0, ans=0.2 2024-08-13 02:43:29,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1951590.0, ans=0.09899494936611666 2024-08-13 02:43:30,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1951590.0, ans=0.0 2024-08-13 02:43:35,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1951590.0, ans=0.125 2024-08-13 02:43:37,639 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 02:44:21,765 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 02:44:41,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-13 02:44:50,024 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 02:44:52,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1951890.0, ans=0.025 2024-08-13 02:44:57,568 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6800, loss[loss=0.1064, beats_loss=0.01082, ecapa_loss=0.0002129, whisper_loss=0.09348, over 20201.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01091, ecapa_loss=0.0001696, whisper_loss=0.09131, over 3877351.78 frames. ], batch size: 88, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:45:01,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1951990.0, ans=0.125 2024-08-13 02:45:36,468 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-13 02:46:10,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1952290.0, ans=0.125 2024-08-13 02:46:13,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1952290.0, ans=0.2 2024-08-13 02:46:13,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1952290.0, ans=0.125 2024-08-13 02:46:16,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.452e+01 2.716e+01 3.076e+01 4.037e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 02:46:29,074 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 33 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 02:46:36,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1952390.0, ans=0.125 2024-08-13 02:46:39,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1952390.0, ans=6.0 2024-08-13 02:46:48,596 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 02:46:52,766 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6850, loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001491, whisper_loss=0.08889, over 19775.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001692, whisper_loss=0.09118, over 3840524.66 frames. ], batch size: 77, lr: 4.50e-03, grad_scale: 1.152921504606847e+18 2024-08-13 02:47:00,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1952490.0, ans=0.125 2024-08-13 02:47:09,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2024-08-13 02:47:17,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1952590.0, ans=0.1 2024-08-13 02:47:27,638 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 02:47:30,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1952590.0, ans=0.125 2024-08-13 02:47:37,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1952590.0, ans=0.125 2024-08-13 02:47:51,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1952690.0, ans=0.2 2024-08-13 02:48:11,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1952790.0, ans=0.0 2024-08-13 02:48:23,567 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-13 02:48:43,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6900, loss[loss=0.09971, beats_loss=0.009716, ecapa_loss=0.0001744, whisper_loss=0.08825, over 18172.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01087, ecapa_loss=0.0001687, whisper_loss=0.09074, over 3823264.67 frames. ], batch size: 74, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:49:12,602 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 02:49:27,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-13 02:49:28,564 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-13 02:49:41,588 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.589e+01 2.754e+01 3.182e+01 2.951e+02, threshold=5.508e+01, percent-clipped=1.0 2024-08-13 02:49:58,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1953390.0, ans=0.2 2024-08-13 02:50:07,470 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 6950, loss[loss=0.0966, beats_loss=0.01112, ecapa_loss=0.0001929, whisper_loss=0.08354, over 18069.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01088, ecapa_loss=0.0001687, whisper_loss=0.09081, over 3822753.06 frames. ], batch size: 74, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:50:20,628 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 02:50:21,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2024-08-13 02:50:26,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1953590.0, ans=0.125 2024-08-13 02:50:35,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1953590.0, ans=0.125 2024-08-13 02:50:44,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=1953690.0, ans=0.1 2024-08-13 02:51:03,185 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 02:51:18,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1953790.0, ans=0.2 2024-08-13 02:51:27,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=22.5 2024-08-13 02:51:38,716 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 02:51:42,780 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7000, loss[loss=0.09668, beats_loss=0.0127, ecapa_loss=0.0001257, whisper_loss=0.08272, over 18786.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01092, ecapa_loss=0.0001682, whisper_loss=0.09035, over 3806759.50 frames. ], batch size: 71, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:51:42,991 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 02:52:04,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1954090.0, ans=0.0 2024-08-13 02:52:08,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1954090.0, ans=0.125 2024-08-13 02:52:48,667 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.444e+01 2.710e+01 2.918e+01 4.538e+01, threshold=5.419e+01, percent-clipped=0.0 2024-08-13 02:52:53,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1954290.0, ans=0.0 2024-08-13 02:53:12,421 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 02:53:16,205 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7050, loss[loss=0.1319, beats_loss=0.009364, ecapa_loss=0.0001332, whisper_loss=0.1212, over 24084.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001684, whisper_loss=0.09076, over 3819627.68 frames. ], batch size: 90, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:53:48,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1954590.0, ans=0.09899494936611666 2024-08-13 02:53:54,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-13 02:54:13,366 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-13 02:54:21,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1954790.0, ans=0.035 2024-08-13 02:54:31,824 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 02:54:39,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1954890.0, ans=0.1 2024-08-13 02:54:43,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2024-08-13 02:54:46,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1954990.0, ans=0.125 2024-08-13 02:54:48,362 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7100, loss[loss=0.1072, beats_loss=0.0114, ecapa_loss=0.0001993, whisper_loss=0.09383, over 17655.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001674, whisper_loss=0.09057, over 3818899.34 frames. ], batch size: 75, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:54:50,378 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.207e+01 2024-08-13 02:54:58,971 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 02:55:25,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.62 vs. limit=10.0 2024-08-13 02:55:37,787 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 02:55:52,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.441e+01 2.743e+01 3.182e+01 6.176e+01, threshold=5.486e+01, percent-clipped=2.0 2024-08-13 02:55:55,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1955290.0, ans=0.0 2024-08-13 02:56:06,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1955390.0, ans=0.125 2024-08-13 02:56:15,456 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 02:56:19,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.04 vs. limit=10.0 2024-08-13 02:56:20,297 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7150, loss[loss=0.1116, beats_loss=0.01007, ecapa_loss=0.0001446, whisper_loss=0.1001, over 20299.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001685, whisper_loss=0.09138, over 3854151.03 frames. ], batch size: 76, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:56:25,391 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-13 02:56:26,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1955490.0, ans=0.2 2024-08-13 02:56:36,287 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 02:56:38,143 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 02:56:45,908 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 02:57:19,230 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 02:57:29,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1955790.0, ans=0.125 2024-08-13 02:57:40,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-13 02:57:53,594 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7200, loss[loss=0.1107, beats_loss=0.01187, ecapa_loss=0.0001455, whisper_loss=0.0974, over 23279.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001683, whisper_loss=0.09122, over 3881273.00 frames. ], batch size: 91, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:57:59,398 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 02:58:04,764 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 02:58:08,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1955990.0, ans=0.125 2024-08-13 02:58:27,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-13 02:58:28,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1956190.0, ans=0.0 2024-08-13 02:58:32,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-13 02:58:56,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.408e+01 2.678e+01 2.996e+01 6.633e+01, threshold=5.357e+01, percent-clipped=2.0 2024-08-13 02:59:22,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1956490.0, ans=0.0 2024-08-13 02:59:23,720 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7250, loss[loss=0.1084, beats_loss=0.0107, ecapa_loss=0.0001804, whisper_loss=0.09591, over 18476.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001687, whisper_loss=0.09151, over 3879016.15 frames. ], batch size: 79, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 02:59:25,152 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-13 03:00:06,237 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 03:00:15,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1956790.0, ans=0.0 2024-08-13 03:00:26,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-13 03:00:48,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2024-08-13 03:00:51,915 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-13 03:00:52,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1956990.0, ans=0.125 2024-08-13 03:00:53,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7300, loss[loss=0.1121, beats_loss=0.01185, ecapa_loss=0.0001311, whisper_loss=0.09891, over 18225.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0108, ecapa_loss=0.00017, whisper_loss=0.09192, over 3878684.48 frames. ], batch size: 71, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:01:04,066 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-13 03:01:19,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1957090.0, ans=0.0 2024-08-13 03:01:19,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1957090.0, ans=0.125 2024-08-13 03:01:20,568 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 23 from Vox, 15 fro AS 2024-08-13 03:01:28,240 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 03:01:31,830 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-13 03:01:39,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1957190.0, ans=0.0 2024-08-13 03:01:41,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-08-13 03:01:55,241 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.017e+01 2.448e+01 2.774e+01 3.121e+01 5.439e+01, threshold=5.548e+01, percent-clipped=1.0 2024-08-13 03:02:00,909 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-13 03:02:14,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-13 03:02:15,682 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 03:02:21,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7350, loss[loss=0.09527, beats_loss=0.01211, ecapa_loss=0.0001711, whisper_loss=0.08145, over 17069.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001693, whisper_loss=0.09205, over 3855568.54 frames. ], batch size: 68, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:02:28,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1957490.0, ans=0.0 2024-08-13 03:03:06,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1957690.0, ans=0.125 2024-08-13 03:03:34,446 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 03:03:39,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1957890.0, ans=0.125 2024-08-13 03:03:41,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-13 03:03:45,824 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7400, loss[loss=0.09659, beats_loss=0.009623, ecapa_loss=0.0001759, whisper_loss=0.08521, over 22659.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01088, ecapa_loss=0.0001697, whisper_loss=0.09219, over 3861534.45 frames. ], batch size: 91, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:03:46,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1957990.0, ans=0.1 2024-08-13 03:03:49,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1957990.0, ans=0.125 2024-08-13 03:03:54,553 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.494e-02 2024-08-13 03:04:00,903 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-13 03:04:02,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1958090.0, ans=0.1 2024-08-13 03:04:07,699 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 03:04:11,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1958090.0, ans=0.125 2024-08-13 03:04:14,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2024-08-13 03:04:44,000 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.515e+01 2.775e+01 3.372e+01 5.725e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 03:05:09,123 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7450, loss[loss=0.1049, beats_loss=0.01102, ecapa_loss=0.0001675, whisper_loss=0.09219, over 21807.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01085, ecapa_loss=0.0001709, whisper_loss=0.09215, over 3869937.57 frames. ], batch size: 88, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:05:15,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1958490.0, ans=0.125 2024-08-13 03:05:19,581 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 03:05:21,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1958490.0, ans=0.015 2024-08-13 03:05:40,152 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-13 03:05:40,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.55 vs. limit=15.0 2024-08-13 03:05:45,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1958690.0, ans=0.125 2024-08-13 03:05:48,674 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 03:05:58,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1958790.0, ans=0.125 2024-08-13 03:05:59,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1958790.0, ans=0.1 2024-08-13 03:06:01,288 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 03:06:14,381 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 03:06:31,513 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7500, loss[loss=0.1047, beats_loss=0.01025, ecapa_loss=0.000181, whisper_loss=0.09269, over 22063.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01091, ecapa_loss=0.0001694, whisper_loss=0.09264, over 3912262.98 frames. ], batch size: 89, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:06:31,678 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 03:06:45,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1958990.0, ans=0.0 2024-08-13 03:06:51,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1959090.0, ans=0.025 2024-08-13 03:06:53,361 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 39 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 03:07:02,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1959090.0, ans=0.1 2024-08-13 03:07:06,689 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 03:07:16,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.27 vs. limit=22.5 2024-08-13 03:07:28,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.459e+01 2.697e+01 3.000e+01 4.880e+01, threshold=5.394e+01, percent-clipped=0.0 2024-08-13 03:07:43,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1959390.0, ans=0.1 2024-08-13 03:07:46,870 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 03:07:52,930 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7550, loss[loss=0.08582, beats_loss=0.01203, ecapa_loss=0.0001883, whisper_loss=0.0719, over 15524.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01084, ecapa_loss=0.0001701, whisper_loss=0.09213, over 3844737.43 frames. ], batch size: 67, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:07:53,051 INFO [train_multi_KD3.py:844] (0/4) A total of 99 cuts. 21 from LS+wenet, 32 from Vox, 46 fro AS 2024-08-13 03:08:09,489 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 03:08:16,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.52 vs. limit=15.0 2024-08-13 03:08:17,335 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 03:08:27,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1959690.0, ans=0.2 2024-08-13 03:08:27,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-13 03:08:30,748 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 03:08:38,469 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 16 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 03:08:50,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1959790.0, ans=0.125 2024-08-13 03:08:50,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1959790.0, ans=0.125 2024-08-13 03:09:11,861 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-196000.pt 2024-08-13 03:09:14,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7600, loss[loss=0.101, beats_loss=0.01068, ecapa_loss=0.0001693, whisper_loss=0.08863, over 23073.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001687, whisper_loss=0.09186, over 3837986.03 frames. ], batch size: 91, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:09:28,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1960090.0, ans=0.0 2024-08-13 03:09:37,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-13 03:09:52,571 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-13 03:09:57,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1960190.0, ans=0.1 2024-08-13 03:10:03,780 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 03:10:08,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.552e+01 2.815e+01 3.111e+01 1.865e+02, threshold=5.629e+01, percent-clipped=3.0 2024-08-13 03:10:12,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-13 03:10:31,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1960490.0, ans=0.125 2024-08-13 03:10:32,106 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7650, loss[loss=0.08205, beats_loss=0.01245, ecapa_loss=0.0001791, whisper_loss=0.06781, over 17179.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001691, whisper_loss=0.09177, over 3851492.98 frames. ], batch size: 71, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:10:52,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1960590.0, ans=0.125 2024-08-13 03:10:53,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2024-08-13 03:10:58,811 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-13 03:11:09,354 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-13 03:11:11,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1960690.0, ans=0.04949747468305833 2024-08-13 03:11:13,782 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 03:11:23,374 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 03:11:24,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1960790.0, ans=0.125 2024-08-13 03:11:25,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1960790.0, ans=0.125 2024-08-13 03:11:35,471 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.695e-02 2024-08-13 03:11:50,493 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7700, loss[loss=0.08509, beats_loss=0.01112, ecapa_loss=0.0001737, whisper_loss=0.07223, over 22578.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01084, ecapa_loss=0.00017, whisper_loss=0.09066, over 3863650.73 frames. ], batch size: 94, lr: 4.49e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:12:01,977 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 03:12:10,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1961090.0, ans=0.0 2024-08-13 03:12:22,061 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 03:12:44,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.500e+01 2.815e+01 3.285e+01 6.862e+01, threshold=5.629e+01, percent-clipped=1.0 2024-08-13 03:12:50,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1961290.0, ans=0.0 2024-08-13 03:13:07,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1961490.0, ans=0.125 2024-08-13 03:13:08,355 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7750, loss[loss=0.1091, beats_loss=0.01257, ecapa_loss=0.0001647, whisper_loss=0.09488, over 22568.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.0001702, whisper_loss=0.09089, over 3910561.67 frames. ], batch size: 94, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:13:11,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2024-08-13 03:13:19,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1961490.0, ans=0.125 2024-08-13 03:13:45,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1961690.0, ans=0.125 2024-08-13 03:14:10,464 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 03:14:25,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7800, loss[loss=0.1021, beats_loss=0.01111, ecapa_loss=0.0001527, whisper_loss=0.08942, over 15547.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01085, ecapa_loss=0.0001699, whisper_loss=0.09165, over 3923051.03 frames. ], batch size: 58, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:14:27,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1961990.0, ans=0.0 2024-08-13 03:14:55,329 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 03:15:02,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1962190.0, ans=0.0 2024-08-13 03:15:19,802 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.419e+01 2.661e+01 3.121e+01 6.090e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-13 03:15:30,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.35 vs. limit=15.0 2024-08-13 03:15:30,734 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 03:15:33,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1962390.0, ans=0.125 2024-08-13 03:15:43,418 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7850, loss[loss=0.118, beats_loss=0.009977, ecapa_loss=0.0001541, whisper_loss=0.1065, over 22042.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01079, ecapa_loss=0.0001703, whisper_loss=0.09222, over 3907888.15 frames. ], batch size: 85, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:16:02,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1962590.0, ans=0.2 2024-08-13 03:16:11,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1962590.0, ans=0.0 2024-08-13 03:16:30,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1962790.0, ans=0.125 2024-08-13 03:16:41,632 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 03:16:46,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2024-08-13 03:16:59,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7900, loss[loss=0.06978, beats_loss=0.01014, ecapa_loss=0.0002002, whisper_loss=0.05764, over 16518.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001693, whisper_loss=0.09109, over 3872412.00 frames. ], batch size: 67, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:17:00,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1962990.0, ans=0.2 2024-08-13 03:17:08,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1962990.0, ans=10.0 2024-08-13 03:17:22,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1963090.0, ans=0.125 2024-08-13 03:17:28,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-08-13 03:17:44,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-13 03:17:52,576 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.438e+01 2.739e+01 3.083e+01 5.244e+01, threshold=5.477e+01, percent-clipped=0.0 2024-08-13 03:17:55,591 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 03:17:57,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-13 03:18:04,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1963390.0, ans=0.025 2024-08-13 03:18:06,912 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 03:18:13,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-08-13 03:18:14,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 7950, loss[loss=0.1115, beats_loss=0.01213, ecapa_loss=0.0001375, whisper_loss=0.09795, over 19009.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01092, ecapa_loss=0.0001677, whisper_loss=0.09226, over 3905488.66 frames. ], batch size: 73, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:18:25,332 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 20 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-13 03:18:40,792 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 03:19:28,931 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8000, loss[loss=0.09097, beats_loss=0.01168, ecapa_loss=0.0001765, whisper_loss=0.07752, over 19409.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01093, ecapa_loss=0.0001665, whisper_loss=0.09201, over 3924280.46 frames. ], batch size: 81, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:19:29,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 03:19:41,676 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 03:19:46,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-13 03:19:49,634 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 23 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-13 03:19:52,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1964090.0, ans=0.0 2024-08-13 03:20:04,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1964190.0, ans=0.125 2024-08-13 03:20:09,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1964190.0, ans=0.1 2024-08-13 03:20:21,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.304e+01 2.712e+01 2.987e+01 5.432e+01, threshold=5.425e+01, percent-clipped=0.0 2024-08-13 03:20:42,381 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8050, loss[loss=0.105, beats_loss=0.009453, ecapa_loss=0.0001732, whisper_loss=0.09386, over 22500.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01088, ecapa_loss=0.000167, whisper_loss=0.09233, over 3917444.01 frames. ], batch size: 92, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:20:46,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1964490.0, ans=0.125 2024-08-13 03:20:47,356 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 03:21:04,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1964590.0, ans=0.1 2024-08-13 03:21:14,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=1964690.0, ans=10.0 2024-08-13 03:21:17,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1964690.0, ans=0.125 2024-08-13 03:21:45,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1964890.0, ans=10.0 2024-08-13 03:21:45,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1964890.0, ans=0.0 2024-08-13 03:21:49,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1964890.0, ans=0.1 2024-08-13 03:21:51,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1964990.0, ans=0.1 2024-08-13 03:21:51,938 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8100, loss[loss=0.08458, beats_loss=0.01356, ecapa_loss=0.0001602, whisper_loss=0.06942, over 18861.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.000167, whisper_loss=0.0918, over 3912966.62 frames. ], batch size: 79, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:21:56,454 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 03:21:58,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1964990.0, ans=0.0 2024-08-13 03:22:39,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.433e+01 2.725e+01 3.019e+01 1.220e+02, threshold=5.449e+01, percent-clipped=1.0 2024-08-13 03:22:58,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1965390.0, ans=0.0 2024-08-13 03:23:00,264 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 34 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 03:23:01,255 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8150, loss[loss=0.1094, beats_loss=0.01037, ecapa_loss=0.0001817, whisper_loss=0.09721, over 23676.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01092, ecapa_loss=0.000168, whisper_loss=0.09149, over 3891515.09 frames. ], batch size: 95, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:23:08,764 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 03:23:13,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1965490.0, ans=0.1 2024-08-13 03:23:25,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1965590.0, ans=0.125 2024-08-13 03:23:59,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1965890.0, ans=0.0 2024-08-13 03:23:59,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2024-08-13 03:24:08,191 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 03:24:08,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1965890.0, ans=0.125 2024-08-13 03:24:10,635 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8200, loss[loss=0.105, beats_loss=0.01173, ecapa_loss=0.0001539, whisper_loss=0.09168, over 20621.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001683, whisper_loss=0.09122, over 3919419.36 frames. ], batch size: 83, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:24:13,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1965990.0, ans=0.125 2024-08-13 03:24:23,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1966090.0, ans=0.0 2024-08-13 03:24:33,152 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.649e-01 2024-08-13 03:24:39,500 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 03:24:47,230 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 03:24:58,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.097e+01 2.561e+01 2.768e+01 3.091e+01 7.365e+01, threshold=5.537e+01, percent-clipped=2.0 2024-08-13 03:25:00,173 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 03:25:07,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1966390.0, ans=0.0 2024-08-13 03:25:11,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1966390.0, ans=0.05 2024-08-13 03:25:14,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-13 03:25:15,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1966390.0, ans=0.125 2024-08-13 03:25:19,146 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8250, loss[loss=0.09457, beats_loss=0.01247, ecapa_loss=0.0001729, whisper_loss=0.08037, over 15597.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01093, ecapa_loss=0.0001676, whisper_loss=0.09146, over 3929730.26 frames. ], batch size: 65, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:25:21,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1966490.0, ans=0.125 2024-08-13 03:25:28,523 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 03:25:33,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1966590.0, ans=0.0 2024-08-13 03:25:44,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1966690.0, ans=0.125 2024-08-13 03:25:52,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1966690.0, ans=0.125 2024-08-13 03:26:07,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1966790.0, ans=0.2 2024-08-13 03:26:09,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1966790.0, ans=0.0 2024-08-13 03:26:09,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1966790.0, ans=0.125 2024-08-13 03:26:13,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-13 03:26:23,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1966890.0, ans=0.125 2024-08-13 03:26:25,391 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8300, loss[loss=0.1062, beats_loss=0.00856, ecapa_loss=0.0002037, whisper_loss=0.09558, over 19221.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.0001671, whisper_loss=0.09118, over 3912007.59 frames. ], batch size: 79, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:26:26,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1966990.0, ans=0.0 2024-08-13 03:26:34,795 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 03:26:48,031 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 03:26:48,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1967090.0, ans=0.125 2024-08-13 03:26:50,993 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 03:26:58,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-13 03:27:01,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=12.0 2024-08-13 03:27:12,898 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.397e+01 2.699e+01 2.951e+01 6.635e+01, threshold=5.397e+01, percent-clipped=2.0 2024-08-13 03:27:17,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1967290.0, ans=0.125 2024-08-13 03:27:19,973 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 03:27:33,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8350, loss[loss=0.09653, beats_loss=0.01153, ecapa_loss=0.0001175, whisper_loss=0.08383, over 15766.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01104, ecapa_loss=0.0001669, whisper_loss=0.09057, over 3879311.56 frames. ], batch size: 60, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:28:01,888 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 03:28:30,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1967890.0, ans=0.125 2024-08-13 03:28:36,800 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 03:28:39,677 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 03:28:42,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8400, loss[loss=0.09886, beats_loss=0.01082, ecapa_loss=0.0001821, whisper_loss=0.08623, over 18826.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01099, ecapa_loss=0.0001686, whisper_loss=0.09128, over 3887357.53 frames. ], batch size: 80, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:28:59,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-13 03:29:00,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1968090.0, ans=0.125 2024-08-13 03:29:19,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1968190.0, ans=0.0 2024-08-13 03:29:20,173 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 03:29:20,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1968190.0, ans=0.0 2024-08-13 03:29:30,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.441e+01 2.759e+01 3.099e+01 1.310e+02, threshold=5.518e+01, percent-clipped=1.0 2024-08-13 03:29:38,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1968390.0, ans=0.0 2024-08-13 03:29:40,983 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 03:29:45,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=22.5 2024-08-13 03:29:46,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1968390.0, ans=0.1 2024-08-13 03:29:51,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8450, loss[loss=0.1196, beats_loss=0.008834, ecapa_loss=0.0001893, whisper_loss=0.1089, over 22211.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01094, ecapa_loss=0.0001678, whisper_loss=0.09117, over 3891378.91 frames. ], batch size: 90, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:29:54,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1968490.0, ans=0.125 2024-08-13 03:30:23,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1968690.0, ans=0.05 2024-08-13 03:30:23,984 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.645e-03 2024-08-13 03:30:24,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1968690.0, ans=0.125 2024-08-13 03:30:44,382 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 03:30:56,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1968890.0, ans=0.0 2024-08-13 03:30:59,952 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8500, loss[loss=0.09077, beats_loss=0.01167, ecapa_loss=0.0001332, whisper_loss=0.07777, over 15783.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01094, ecapa_loss=0.0001672, whisper_loss=0.09086, over 3894591.23 frames. ], batch size: 60, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:31:04,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1968990.0, ans=0.0 2024-08-13 03:31:10,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1968990.0, ans=0.125 2024-08-13 03:31:16,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1969090.0, ans=0.125 2024-08-13 03:31:17,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-13 03:31:19,639 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.423e+05 2024-08-13 03:31:33,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1969190.0, ans=0.2 2024-08-13 03:31:48,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.417e+01 2.734e+01 3.054e+01 8.886e+01, threshold=5.467e+01, percent-clipped=1.0 2024-08-13 03:31:50,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1969290.0, ans=0.125 2024-08-13 03:31:58,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.86 vs. limit=22.5 2024-08-13 03:32:03,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1969390.0, ans=0.1 2024-08-13 03:32:08,464 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8550, loss[loss=0.08295, beats_loss=0.0122, ecapa_loss=0.0001136, whisper_loss=0.06961, over 15222.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001668, whisper_loss=0.09125, over 3885649.53 frames. ], batch size: 56, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:32:26,573 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 03:33:17,173 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8600, loss[loss=0.1108, beats_loss=0.009401, ecapa_loss=0.0001505, whisper_loss=0.09992, over 17895.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.000166, whisper_loss=0.09163, over 3902095.68 frames. ], batch size: 69, lr: 4.48e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:33:30,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1970090.0, ans=0.2 2024-08-13 03:33:46,637 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 03:34:06,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.434e+01 2.755e+01 2.994e+01 8.345e+01, threshold=5.511e+01, percent-clipped=1.0 2024-08-13 03:34:18,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1970390.0, ans=0.125 2024-08-13 03:34:28,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8650, loss[loss=0.1172, beats_loss=0.008112, ecapa_loss=0.0002091, whisper_loss=0.107, over 17417.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01078, ecapa_loss=0.0001677, whisper_loss=0.09244, over 3918042.03 frames. ], batch size: 72, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:34:32,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1970490.0, ans=0.1 2024-08-13 03:34:37,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1970490.0, ans=0.125 2024-08-13 03:34:38,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2024-08-13 03:34:48,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-08-13 03:34:51,382 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 03:34:55,498 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 03:34:58,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1970690.0, ans=0.0 2024-08-13 03:35:01,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1970690.0, ans=0.125 2024-08-13 03:35:06,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=1970690.0, ans=0.2 2024-08-13 03:35:10,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1970690.0, ans=0.125 2024-08-13 03:35:18,005 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-13 03:35:18,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1970790.0, ans=0.1 2024-08-13 03:35:19,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1970790.0, ans=0.1 2024-08-13 03:35:21,183 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 03:35:30,181 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:35:32,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-13 03:35:44,088 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8700, loss[loss=0.07082, beats_loss=0.01338, ecapa_loss=0.0001525, whisper_loss=0.05592, over 16047.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.0001677, whisper_loss=0.09171, over 3873077.26 frames. ], batch size: 61, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:36:01,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1971090.0, ans=0.2 2024-08-13 03:36:02,622 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 03:36:13,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1971090.0, ans=0.125 2024-08-13 03:36:30,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1971190.0, ans=0.0 2024-08-13 03:36:40,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.118e+01 2.522e+01 2.761e+01 3.315e+01 1.069e+02, threshold=5.521e+01, percent-clipped=2.0 2024-08-13 03:36:53,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2024-08-13 03:37:05,281 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8750, loss[loss=0.1158, beats_loss=0.009192, ecapa_loss=0.000156, whisper_loss=0.1051, over 14960.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001687, whisper_loss=0.09158, over 3880744.02 frames. ], batch size: 59, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:37:12,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-13 03:37:19,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-08-13 03:37:21,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1971590.0, ans=0.1 2024-08-13 03:37:21,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1971590.0, ans=0.2 2024-08-13 03:37:31,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-08-13 03:37:42,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1971690.0, ans=0.125 2024-08-13 03:37:47,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1971690.0, ans=0.1 2024-08-13 03:38:03,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=22.5 2024-08-13 03:38:04,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1971790.0, ans=0.05 2024-08-13 03:38:24,762 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8800, loss[loss=0.0944, beats_loss=0.01116, ecapa_loss=0.0001813, whisper_loss=0.08142, over 21107.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001701, whisper_loss=0.09182, over 3900617.58 frames. ], batch size: 88, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:38:38,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-08-13 03:38:47,634 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 03:38:56,114 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 03:39:23,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.033e+01 2.400e+01 2.713e+01 2.983e+01 4.963e+01, threshold=5.426e+01, percent-clipped=0.0 2024-08-13 03:39:32,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1972390.0, ans=0.0 2024-08-13 03:39:46,405 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8850, loss[loss=0.1192, beats_loss=0.01079, ecapa_loss=0.000173, whisper_loss=0.1067, over 21731.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01088, ecapa_loss=0.0001687, whisper_loss=0.09121, over 3875708.62 frames. ], batch size: 88, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:39:57,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1972490.0, ans=0.0 2024-08-13 03:39:59,066 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.176e+01 2024-08-13 03:40:31,420 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-13 03:40:42,382 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 03:40:49,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1972790.0, ans=0.0 2024-08-13 03:40:58,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1972890.0, ans=0.1 2024-08-13 03:41:08,101 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8900, loss[loss=0.1203, beats_loss=0.00985, ecapa_loss=0.000206, whisper_loss=0.1084, over 22575.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01094, ecapa_loss=0.0001684, whisper_loss=0.09091, over 3880196.51 frames. ], batch size: 92, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:41:09,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1972990.0, ans=0.2 2024-08-13 03:41:10,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1972990.0, ans=0.125 2024-08-13 03:41:11,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1972990.0, ans=0.2 2024-08-13 03:41:29,380 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 34 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-13 03:41:30,743 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 03:41:38,980 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-13 03:42:00,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1973290.0, ans=0.1 2024-08-13 03:42:05,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.456e+01 2.768e+01 3.242e+01 5.170e+01, threshold=5.536e+01, percent-clipped=0.0 2024-08-13 03:42:19,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1973390.0, ans=0.1 2024-08-13 03:42:29,736 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 8950, loss[loss=0.1258, beats_loss=0.007864, ecapa_loss=0.0001946, whisper_loss=0.116, over 20739.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.0001683, whisper_loss=0.09143, over 3876997.13 frames. ], batch size: 85, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:42:58,754 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-13 03:43:00,070 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 03:43:09,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1973690.0, ans=0.04949747468305833 2024-08-13 03:43:47,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1973990.0, ans=0.125 2024-08-13 03:43:48,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9000, loss[loss=0.1038, beats_loss=0.01173, ecapa_loss=0.0001638, whisper_loss=0.09045, over 22327.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01097, ecapa_loss=0.0001696, whisper_loss=0.09122, over 3881379.76 frames. ], batch size: 91, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:43:48,157 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 03:44:28,211 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005752, whisper_loss=0.2484, over 922467.00 frames. 2024-08-13 03:44:46,379 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on SV_voxceleb1: loss=0.004584, beats_loss=0, ecapa_loss=0.0004584, whisper_loss=0, over 939242.00 frames. 2024-08-13 03:45:12,181 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0655, 3.2712, 3.3000, 2.9615], device='cuda:0') 2024-08-13 03:46:42,090 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on AT_audioset: loss=0.02386, beats_loss=0.02386, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 03:46:42,095 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 03:47:02,553 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 03:47:15,507 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 03:47:15,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1974190.0, ans=0.125 2024-08-13 03:47:21,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1974190.0, ans=0.0 2024-08-13 03:47:34,037 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 03:47:41,776 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.559e+01 2.793e+01 3.240e+01 5.167e+01, threshold=5.585e+01, percent-clipped=0.0 2024-08-13 03:47:43,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1974290.0, ans=0.0 2024-08-13 03:47:48,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2024-08-13 03:47:49,511 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-13 03:48:07,092 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9050, loss[loss=0.1125, beats_loss=0.01061, ecapa_loss=0.0001893, whisper_loss=0.1, over 22971.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01094, ecapa_loss=0.0001693, whisper_loss=0.09165, over 3884239.28 frames. ], batch size: 94, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:48:22,221 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 03:48:48,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2024-08-13 03:49:13,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1974890.0, ans=0.07 2024-08-13 03:49:23,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1974890.0, ans=0.125 2024-08-13 03:49:28,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9100, loss[loss=0.09959, beats_loss=0.009784, ecapa_loss=0.0001806, whisper_loss=0.088, over 17654.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01083, ecapa_loss=0.0001695, whisper_loss=0.09226, over 3892872.18 frames. ], batch size: 70, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:49:49,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1975090.0, ans=0.125 2024-08-13 03:50:07,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1975190.0, ans=0.125 2024-08-13 03:50:09,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-08-13 03:50:26,273 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.465e+01 2.794e+01 3.182e+01 5.687e+01, threshold=5.588e+01, percent-clipped=1.0 2024-08-13 03:50:43,476 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 03:50:46,067 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 03:50:52,136 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9150, loss[loss=0.1068, beats_loss=0.01034, ecapa_loss=0.0001804, whisper_loss=0.0947, over 19911.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001705, whisper_loss=0.09216, over 3876989.53 frames. ], batch size: 78, lr: 4.47e-03, grad_scale: 1.152921504606847e+18 2024-08-13 03:50:54,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-13 03:50:57,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1975490.0, ans=0.2 2024-08-13 03:51:08,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1975590.0, ans=0.125 2024-08-13 03:51:15,389 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 03:51:15,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1975590.0, ans=0.125 2024-08-13 03:51:20,382 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-13 03:51:20,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1975590.0, ans=0.125 2024-08-13 03:51:42,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-08-13 03:51:44,981 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 03:51:50,441 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 03:51:52,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1975790.0, ans=0.125 2024-08-13 03:52:13,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9200, loss[loss=0.1019, beats_loss=0.01149, ecapa_loss=0.0001418, whisper_loss=0.08903, over 19839.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01079, ecapa_loss=0.0001695, whisper_loss=0.09256, over 3904271.97 frames. ], batch size: 77, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:52:23,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1975990.0, ans=0.125 2024-08-13 03:52:25,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=12.0 2024-08-13 03:52:29,410 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-13 03:52:29,471 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.641e+01 2024-08-13 03:52:32,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1976090.0, ans=0.0 2024-08-13 03:52:37,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1976090.0, ans=0.2 2024-08-13 03:52:43,868 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 03:53:02,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=22.5 2024-08-13 03:53:11,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.444e+01 2.723e+01 3.266e+01 6.783e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 03:53:11,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2024-08-13 03:53:20,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1976390.0, ans=0.1 2024-08-13 03:53:32,646 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9250, loss[loss=0.1102, beats_loss=0.01096, ecapa_loss=0.0001406, whisper_loss=0.09785, over 18710.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01091, ecapa_loss=0.0001696, whisper_loss=0.09161, over 3935794.32 frames. ], batch size: 73, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:53:39,405 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 03:54:06,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1976690.0, ans=0.0 2024-08-13 03:54:21,165 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 03:54:22,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1976790.0, ans=0.2 2024-08-13 03:54:41,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1976890.0, ans=0.125 2024-08-13 03:54:49,099 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 03:54:52,528 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 21 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 03:54:52,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1976990.0, ans=0.2 2024-08-13 03:54:54,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9300, loss[loss=0.1089, beats_loss=0.009589, ecapa_loss=0.0002066, whisper_loss=0.09723, over 16475.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.0001708, whisper_loss=0.09166, over 3937404.33 frames. ], batch size: 67, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:55:07,854 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 03:55:08,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=22.5 2024-08-13 03:55:15,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1977090.0, ans=10.0 2024-08-13 03:55:29,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-08-13 03:55:30,131 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 03:55:42,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1977190.0, ans=0.0 2024-08-13 03:55:52,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1977290.0, ans=0.02 2024-08-13 03:55:55,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.460e+01 2.642e+01 2.957e+01 1.771e+02, threshold=5.283e+01, percent-clipped=2.0 2024-08-13 03:56:09,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1977390.0, ans=0.125 2024-08-13 03:56:11,836 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 03:56:17,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-13 03:56:18,751 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9350, loss[loss=0.1079, beats_loss=0.01016, ecapa_loss=0.0001971, whisper_loss=0.09574, over 18833.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001694, whisper_loss=0.09209, over 3917701.20 frames. ], batch size: 78, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:56:28,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-13 03:56:30,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2024-08-13 03:56:52,413 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.149e-01 2024-08-13 03:57:12,740 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 21 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-13 03:57:29,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1977890.0, ans=0.0 2024-08-13 03:57:38,861 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9400, loss[loss=0.09299, beats_loss=0.01219, ecapa_loss=0.000191, whisper_loss=0.07889, over 21031.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.0001685, whisper_loss=0.09177, over 3916761.29 frames. ], batch size: 91, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:57:59,356 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 27 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 03:58:19,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1978190.0, ans=0.1 2024-08-13 03:58:27,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1978290.0, ans=0.0 2024-08-13 03:58:34,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1978290.0, ans=0.0 2024-08-13 03:58:34,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.425e+01 2.641e+01 3.063e+01 7.732e+01, threshold=5.282e+01, percent-clipped=1.0 2024-08-13 03:58:43,101 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 03:58:57,130 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9450, loss[loss=0.09525, beats_loss=0.01431, ecapa_loss=0.0001351, whisper_loss=0.07959, over 19179.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01095, ecapa_loss=0.0001693, whisper_loss=0.09132, over 3905504.94 frames. ], batch size: 77, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 03:59:10,754 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.513e-02 2024-08-13 03:59:16,437 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 03:59:18,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1978590.0, ans=0.125 2024-08-13 03:59:21,577 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 03:59:28,362 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 16 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 03:59:45,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1978790.0, ans=0.0 2024-08-13 03:59:53,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1978790.0, ans=0.0 2024-08-13 03:59:56,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1978790.0, ans=0.125 2024-08-13 04:00:02,753 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 04:00:17,076 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9500, loss[loss=0.09453, beats_loss=0.01102, ecapa_loss=0.0001873, whisper_loss=0.08163, over 21961.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01092, ecapa_loss=0.0001695, whisper_loss=0.09095, over 3901116.99 frames. ], batch size: 92, lr: 4.47e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:00:17,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1978990.0, ans=0.0 2024-08-13 04:00:21,823 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 04:00:31,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1979090.0, ans=0.125 2024-08-13 04:00:39,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1979090.0, ans=0.125 2024-08-13 04:00:47,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.0 2024-08-13 04:01:12,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.497e+01 2.737e+01 3.144e+01 1.195e+02, threshold=5.474e+01, percent-clipped=3.0 2024-08-13 04:01:33,979 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9550, loss[loss=0.1039, beats_loss=0.01098, ecapa_loss=0.0001861, whisper_loss=0.0911, over 19911.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01095, ecapa_loss=0.000169, whisper_loss=0.09038, over 3879739.74 frames. ], batch size: 82, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:02:03,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-08-13 04:02:17,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-08-13 04:02:35,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=12.0 2024-08-13 04:02:46,845 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9600, loss[loss=0.1089, beats_loss=0.00868, ecapa_loss=0.0002606, whisper_loss=0.09764, over 15726.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01087, ecapa_loss=0.0001693, whisper_loss=0.09129, over 3891013.68 frames. ], batch size: 67, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:02:53,804 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 04:02:58,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1979990.0, ans=0.125 2024-08-13 04:02:58,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1979990.0, ans=0.125 2024-08-13 04:03:00,655 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 23 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-13 04:03:22,403 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 04:03:35,128 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 04:03:36,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.597e+01 2.785e+01 3.117e+01 4.817e+01, threshold=5.569e+01, percent-clipped=0.0 2024-08-13 04:03:43,446 INFO [train_multi_KD3.py:844] (0/4) A total of 97 cuts. 23 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-13 04:03:49,939 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-13 04:03:55,290 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9650, loss[loss=0.1173, beats_loss=0.01043, ecapa_loss=0.0001811, whisper_loss=0.1051, over 21365.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001702, whisper_loss=0.09111, over 3846626.08 frames. ], batch size: 85, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:04:00,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1980490.0, ans=0.0 2024-08-13 04:04:10,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-13 04:04:36,341 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 04:04:36,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1980790.0, ans=0.2 2024-08-13 04:05:01,334 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 04:05:05,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9700, loss[loss=0.1082, beats_loss=0.01141, ecapa_loss=0.0001502, whisper_loss=0.09525, over 22739.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01082, ecapa_loss=0.0001722, whisper_loss=0.09066, over 3842232.86 frames. ], batch size: 93, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:05:05,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1980990.0, ans=0.1 2024-08-13 04:05:10,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2024-08-13 04:05:22,330 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-13 04:05:29,147 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 04:05:30,470 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-13 04:05:42,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1981190.0, ans=0.125 2024-08-13 04:05:43,227 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 04:05:55,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.457e+01 2.661e+01 2.979e+01 4.854e+01, threshold=5.323e+01, percent-clipped=0.0 2024-08-13 04:06:08,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2024-08-13 04:06:14,730 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9750, loss[loss=0.1108, beats_loss=0.01005, ecapa_loss=0.000192, whisper_loss=0.09882, over 21622.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.000172, whisper_loss=0.09114, over 3862043.02 frames. ], batch size: 92, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:06:27,475 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 04:06:35,601 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 04:06:47,839 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 27 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 04:06:52,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1981690.0, ans=0.125 2024-08-13 04:06:54,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2024-08-13 04:07:07,695 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-13 04:07:09,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1981890.0, ans=0.95 2024-08-13 04:07:24,506 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9800, loss[loss=0.09806, beats_loss=0.01238, ecapa_loss=0.0001863, whisper_loss=0.08382, over 18554.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.0001711, whisper_loss=0.09145, over 3820754.68 frames. ], batch size: 75, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:07:27,611 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 04:07:28,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-13 04:07:31,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1981990.0, ans=0.1 2024-08-13 04:07:50,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1982090.0, ans=0.125 2024-08-13 04:08:04,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1982190.0, ans=0.125 2024-08-13 04:08:07,219 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 04:08:10,162 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 04:08:15,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.387e+01 2.562e+01 2.934e+01 4.315e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-13 04:08:20,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1982390.0, ans=0.125 2024-08-13 04:08:34,309 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9850, loss[loss=0.08294, beats_loss=0.0129, ecapa_loss=0.0001663, whisper_loss=0.06837, over 16308.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01088, ecapa_loss=0.0001698, whisper_loss=0.09079, over 3811845.64 frames. ], batch size: 64, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:08:36,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=15.0 2024-08-13 04:08:36,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=22.5 2024-08-13 04:08:56,504 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-13 04:08:57,949 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 04:09:02,303 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 04:09:13,888 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 21 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 04:09:21,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1982790.0, ans=0.0 2024-08-13 04:09:44,022 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9900, loss[loss=0.1362, beats_loss=0.009211, ecapa_loss=0.0001662, whisper_loss=0.1253, over 23835.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001687, whisper_loss=0.09176, over 3867407.12 frames. ], batch size: 94, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:09:46,927 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 04:10:06,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1983090.0, ans=0.1 2024-08-13 04:10:29,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1983290.0, ans=0.125 2024-08-13 04:10:34,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.489e+01 2.832e+01 3.268e+01 9.650e+01, threshold=5.664e+01, percent-clipped=3.0 2024-08-13 04:10:36,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1983290.0, ans=0.125 2024-08-13 04:10:37,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1983290.0, ans=0.125 2024-08-13 04:10:49,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1983390.0, ans=0.0 2024-08-13 04:10:51,871 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 04:10:53,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 9950, loss[loss=0.09399, beats_loss=0.01386, ecapa_loss=0.0001591, whisper_loss=0.07854, over 19630.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01093, ecapa_loss=0.0001694, whisper_loss=0.0916, over 3855511.62 frames. ], batch size: 83, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:11:00,131 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 04:11:04,209 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:11:19,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1983690.0, ans=0.0 2024-08-13 04:11:23,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1983690.0, ans=0.0 2024-08-13 04:11:25,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1983690.0, ans=0.125 2024-08-13 04:11:33,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.16 vs. limit=10.0 2024-08-13 04:12:01,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=12.0 2024-08-13 04:12:02,144 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10000, loss[loss=0.1274, beats_loss=0.006985, ecapa_loss=0.0001741, whisper_loss=0.1186, over 18926.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.0001703, whisper_loss=0.09102, over 3841310.11 frames. ], batch size: 72, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:12:02,401 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-13 04:12:04,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-08-13 04:12:17,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1984090.0, ans=0.125 2024-08-13 04:12:31,715 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 04:12:37,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-13 04:12:37,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.28 vs. limit=22.5 2024-08-13 04:12:50,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=15.0 2024-08-13 04:12:52,309 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.356e+01 2.631e+01 2.870e+01 5.046e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-13 04:12:54,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-08-13 04:13:10,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1984490.0, ans=0.5 2024-08-13 04:13:11,559 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10050, loss[loss=0.1012, beats_loss=0.01191, ecapa_loss=0.0001855, whisper_loss=0.08748, over 22292.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001696, whisper_loss=0.09111, over 3855289.40 frames. ], batch size: 91, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:13:11,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1984490.0, ans=0.0 2024-08-13 04:13:12,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=1984490.0, ans=6.0 2024-08-13 04:13:22,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1984490.0, ans=0.125 2024-08-13 04:13:26,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1984590.0, ans=0.125 2024-08-13 04:13:56,160 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 04:13:56,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1984790.0, ans=0.125 2024-08-13 04:14:00,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1984790.0, ans=0.0 2024-08-13 04:14:02,910 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 04:14:04,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1984790.0, ans=0.0 2024-08-13 04:14:08,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1984890.0, ans=0.125 2024-08-13 04:14:20,837 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10100, loss[loss=0.09705, beats_loss=0.0116, ecapa_loss=0.0001555, whisper_loss=0.08389, over 19440.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.011, ecapa_loss=0.0001689, whisper_loss=0.09115, over 3867744.15 frames. ], batch size: 75, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:14:34,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=1985090.0, ans=15.0 2024-08-13 04:14:38,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1985090.0, ans=0.125 2024-08-13 04:14:42,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1985090.0, ans=0.125 2024-08-13 04:14:55,464 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 22 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 04:15:07,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1985290.0, ans=0.125 2024-08-13 04:15:10,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.445e+01 2.629e+01 3.089e+01 3.463e+02, threshold=5.257e+01, percent-clipped=1.0 2024-08-13 04:15:13,165 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 04:15:16,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1985390.0, ans=0.1 2024-08-13 04:15:28,365 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 04:15:29,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10150, loss[loss=0.08215, beats_loss=0.01162, ecapa_loss=0.0001531, whisper_loss=0.069, over 20858.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001689, whisper_loss=0.09115, over 3885707.61 frames. ], batch size: 84, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:15:40,676 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.533e-03 2024-08-13 04:15:47,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1985590.0, ans=0.0 2024-08-13 04:15:48,839 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 04:15:51,691 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 04:16:08,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-13 04:16:14,230 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 31 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 04:16:21,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2024-08-13 04:16:30,310 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 04:16:33,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2024-08-13 04:16:34,430 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 04:16:38,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10200, loss[loss=0.06896, beats_loss=0.01034, ecapa_loss=0.0002153, whisper_loss=0.05646, over 15857.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01093, ecapa_loss=0.0001689, whisper_loss=0.09077, over 3878526.03 frames. ], batch size: 69, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:16:51,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-13 04:16:58,095 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 04:16:59,769 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.207e-02 2024-08-13 04:17:11,952 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 04:17:17,111 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 04:17:20,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1986290.0, ans=0.125 2024-08-13 04:17:22,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1986290.0, ans=0.0 2024-08-13 04:17:27,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.440e+01 2.685e+01 3.230e+01 3.990e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 04:17:30,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1986290.0, ans=0.125 2024-08-13 04:17:37,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1986390.0, ans=0.125 2024-08-13 04:17:47,001 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10250, loss[loss=0.1023, beats_loss=0.009984, ecapa_loss=0.0001832, whisper_loss=0.09048, over 21158.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.000169, whisper_loss=0.0912, over 3891675.08 frames. ], batch size: 88, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:17:50,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=1986490.0, ans=15.0 2024-08-13 04:18:10,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1986590.0, ans=0.125 2024-08-13 04:18:11,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1986590.0, ans=0.125 2024-08-13 04:18:33,228 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 04:18:46,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1986890.0, ans=0.0 2024-08-13 04:18:47,461 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 19 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 04:18:55,401 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10300, loss[loss=0.1231, beats_loss=0.009502, ecapa_loss=0.0001983, whisper_loss=0.1117, over 22777.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001688, whisper_loss=0.09152, over 3892667.78 frames. ], batch size: 91, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:19:10,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.76 vs. limit=15.0 2024-08-13 04:19:20,170 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 04:19:31,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1987190.0, ans=0.125 2024-08-13 04:19:44,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.484e+01 2.743e+01 3.118e+01 4.422e+01, threshold=5.485e+01, percent-clipped=0.0 2024-08-13 04:20:03,134 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10350, loss[loss=0.08529, beats_loss=0.01078, ecapa_loss=0.0001533, whisper_loss=0.07297, over 19917.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01095, ecapa_loss=0.000167, whisper_loss=0.09079, over 3894743.91 frames. ], batch size: 78, lr: 4.46e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:20:17,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1987590.0, ans=0.05 2024-08-13 04:20:22,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1987590.0, ans=0.07 2024-08-13 04:20:28,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1987590.0, ans=0.0 2024-08-13 04:20:33,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1987690.0, ans=0.0 2024-08-13 04:20:34,897 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 04:20:36,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-08-13 04:20:39,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-08-13 04:20:50,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1987790.0, ans=0.125 2024-08-13 04:21:02,713 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 04:21:12,223 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10400, loss[loss=0.08681, beats_loss=0.01584, ecapa_loss=0.000147, whisper_loss=0.0695, over 17219.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01095, ecapa_loss=0.0001662, whisper_loss=0.09109, over 3898792.02 frames. ], batch size: 71, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:21:24,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1987990.0, ans=0.0 2024-08-13 04:21:30,714 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 04:21:37,477 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 04:21:43,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1988190.0, ans=0.125 2024-08-13 04:21:58,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1988290.0, ans=0.125 2024-08-13 04:22:01,927 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.435e+01 2.770e+01 3.094e+01 5.065e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 04:22:06,143 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 04:22:09,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1988390.0, ans=0.2 2024-08-13 04:22:15,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=15.0 2024-08-13 04:22:21,430 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10450, loss[loss=0.1134, beats_loss=0.01169, ecapa_loss=0.0001455, whisper_loss=0.1003, over 18750.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01092, ecapa_loss=0.000167, whisper_loss=0.09098, over 3886292.07 frames. ], batch size: 73, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:22:24,511 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.066e-03 2024-08-13 04:22:25,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-13 04:22:31,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=1988490.0, ans=0.1 2024-08-13 04:22:39,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1988590.0, ans=0.125 2024-08-13 04:22:39,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-08-13 04:22:41,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1988590.0, ans=0.125 2024-08-13 04:22:42,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-13 04:22:48,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1988690.0, ans=0.125 2024-08-13 04:22:52,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-08-13 04:22:54,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1988690.0, ans=0.125 2024-08-13 04:22:55,969 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 20 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 04:23:05,744 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 04:23:07,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1988790.0, ans=0.0 2024-08-13 04:23:12,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1988790.0, ans=0.1 2024-08-13 04:23:13,933 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 04:23:30,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10500, loss[loss=0.09043, beats_loss=0.008409, ecapa_loss=0.0001705, whisper_loss=0.08032, over 14493.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001685, whisper_loss=0.09146, over 3873908.51 frames. ], batch size: 56, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:23:36,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1988990.0, ans=0.05 2024-08-13 04:24:03,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1989190.0, ans=0.125 2024-08-13 04:24:21,490 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.377e+01 2.646e+01 2.972e+01 5.578e+01, threshold=5.291e+01, percent-clipped=1.0 2024-08-13 04:24:43,048 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10550, loss[loss=0.1303, beats_loss=0.008497, ecapa_loss=0.0001871, whisper_loss=0.12, over 22824.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001692, whisper_loss=0.09127, over 3886936.91 frames. ], batch size: 90, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:25:07,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1989590.0, ans=0.125 2024-08-13 04:25:21,074 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-13 04:25:31,910 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-13 04:25:39,463 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-13 04:26:00,442 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10600, loss[loss=0.1005, beats_loss=0.01135, ecapa_loss=0.0001521, whisper_loss=0.08762, over 16074.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01085, ecapa_loss=0.000169, whisper_loss=0.09094, over 3870974.81 frames. ], batch size: 62, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:26:14,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1990090.0, ans=0.1 2024-08-13 04:26:28,835 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 04:26:34,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1990190.0, ans=0.0 2024-08-13 04:26:35,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-13 04:26:54,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.291e+01 2.645e+01 2.934e+01 5.325e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-13 04:26:55,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1990290.0, ans=0.0 2024-08-13 04:27:00,460 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 04:27:08,016 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 16 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 04:27:15,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10650, loss[loss=0.1026, beats_loss=0.01019, ecapa_loss=0.0001726, whisper_loss=0.09073, over 23101.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01086, ecapa_loss=0.0001666, whisper_loss=0.09116, over 3851360.03 frames. ], batch size: 92, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:27:25,566 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 04:27:31,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1990590.0, ans=0.2 2024-08-13 04:27:42,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1990590.0, ans=0.125 2024-08-13 04:27:54,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2024-08-13 04:27:57,794 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 04:28:22,550 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 04:28:35,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10700, loss[loss=0.08998, beats_loss=0.01008, ecapa_loss=0.0002428, whisper_loss=0.07748, over 16443.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.0109, ecapa_loss=0.0001664, whisper_loss=0.09138, over 3889951.41 frames. ], batch size: 71, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:28:37,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1990990.0, ans=0.07 2024-08-13 04:28:55,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1991090.0, ans=0.2 2024-08-13 04:28:55,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1991090.0, ans=0.07 2024-08-13 04:29:05,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1991190.0, ans=0.125 2024-08-13 04:29:08,518 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 04:29:13,125 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 12 from Vox, 48 fro AS 2024-08-13 04:29:21,738 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 04:29:26,501 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 31 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 04:29:30,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.433e+01 2.666e+01 3.252e+01 5.472e+01, threshold=5.332e+01, percent-clipped=1.0 2024-08-13 04:29:38,867 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 04:29:52,723 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10750, loss[loss=0.087, beats_loss=0.01389, ecapa_loss=0.0001832, whisper_loss=0.07127, over 17340.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01097, ecapa_loss=0.0001672, whisper_loss=0.09168, over 3903978.11 frames. ], batch size: 71, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:29:57,872 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 04:30:16,562 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 04:30:18,306 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 22 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-13 04:30:24,792 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 04:30:26,176 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-13 04:30:26,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1991690.0, ans=0.0 2024-08-13 04:30:28,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1991690.0, ans=0.2 2024-08-13 04:30:32,720 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 04:30:38,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-13 04:30:44,024 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 04:31:02,722 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 04:31:12,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1991990.0, ans=0.035 2024-08-13 04:31:13,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10800, loss[loss=0.09841, beats_loss=0.009014, ecapa_loss=0.0002231, whisper_loss=0.08717, over 21041.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01094, ecapa_loss=0.0001675, whisper_loss=0.09178, over 3907588.81 frames. ], batch size: 90, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:31:29,092 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-13 04:31:43,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2024-08-13 04:31:45,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=12.0 2024-08-13 04:31:46,083 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 04:31:46,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1992190.0, ans=0.09899494936611666 2024-08-13 04:31:51,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-13 04:31:55,460 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 04:31:57,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1992190.0, ans=0.0 2024-08-13 04:32:00,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2024-08-13 04:32:10,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.408e+01 2.896e+01 3.475e+01 4.951e+01, threshold=5.792e+01, percent-clipped=0.0 2024-08-13 04:32:23,728 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-13 04:32:26,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-08-13 04:32:32,914 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10850, loss[loss=0.1169, beats_loss=0.008859, ecapa_loss=0.000169, whisper_loss=0.1064, over 22286.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01085, ecapa_loss=0.0001676, whisper_loss=0.09191, over 3875798.17 frames. ], batch size: 90, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:32:40,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1992490.0, ans=0.125 2024-08-13 04:32:53,439 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 04:33:03,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1992690.0, ans=0.0 2024-08-13 04:33:04,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1992690.0, ans=0.0 2024-08-13 04:33:11,328 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 21 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-13 04:33:12,683 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 04:33:46,711 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 04:33:51,682 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10900, loss[loss=0.1057, beats_loss=0.01245, ecapa_loss=0.0001431, whisper_loss=0.09183, over 22325.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.000167, whisper_loss=0.09168, over 3894090.87 frames. ], batch size: 88, lr: 4.45e-03, grad_scale: 5.764607523034235e+17 2024-08-13 04:33:54,682 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 04:33:56,428 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 04:34:00,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1992990.0, ans=0.125 2024-08-13 04:34:24,713 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 04:34:25,128 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.473e-01 2024-08-13 04:34:36,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1993190.0, ans=22.5 2024-08-13 04:34:39,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-13 04:34:50,377 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 04:34:51,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.538e+01 2.794e+01 3.172e+01 4.370e+01, threshold=5.589e+01, percent-clipped=0.0 2024-08-13 04:35:09,320 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 04:35:12,633 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 10950, loss[loss=0.1107, beats_loss=0.00847, ecapa_loss=0.0001768, whisper_loss=0.1004, over 19199.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0108, ecapa_loss=0.0001671, whisper_loss=0.09212, over 3882062.32 frames. ], batch size: 78, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:35:59,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1993790.0, ans=0.125 2024-08-13 04:36:02,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1993790.0, ans=0.07 2024-08-13 04:36:04,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1993790.0, ans=0.125 2024-08-13 04:36:07,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2024-08-13 04:36:32,253 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-13 04:36:32,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1993990.0, ans=0.125 2024-08-13 04:36:33,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11000, loss[loss=0.112, beats_loss=0.008832, ecapa_loss=0.0001447, whisper_loss=0.1017, over 16705.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01082, ecapa_loss=0.0001676, whisper_loss=0.09234, over 3912138.90 frames. ], batch size: 63, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:36:34,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2024-08-13 04:36:54,630 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-13 04:37:00,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1994090.0, ans=0.1 2024-08-13 04:37:10,788 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 21 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-13 04:37:21,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1994290.0, ans=0.125 2024-08-13 04:37:22,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=22.5 2024-08-13 04:37:30,654 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 18 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 04:37:30,944 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:37:33,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.385e+01 2.603e+01 2.980e+01 9.171e+01, threshold=5.207e+01, percent-clipped=2.0 2024-08-13 04:37:33,859 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 29 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 04:37:48,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1994390.0, ans=0.2 2024-08-13 04:37:54,192 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11050, loss[loss=0.09979, beats_loss=0.008712, ecapa_loss=0.0001806, whisper_loss=0.08928, over 15316.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01078, ecapa_loss=0.0001688, whisper_loss=0.09247, over 3929989.69 frames. ], batch size: 62, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:37:56,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.93 vs. limit=22.5 2024-08-13 04:37:57,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.59 vs. limit=5.0 2024-08-13 04:38:00,351 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 04:38:25,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1994690.0, ans=0.0 2024-08-13 04:38:31,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1994690.0, ans=0.2 2024-08-13 04:38:42,648 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 04:39:07,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1994890.0, ans=0.125 2024-08-13 04:39:07,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1994890.0, ans=0.125 2024-08-13 04:39:10,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-13 04:39:18,230 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11100, loss[loss=0.08375, beats_loss=0.01049, ecapa_loss=0.0001263, whisper_loss=0.072, over 18685.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01079, ecapa_loss=0.0001678, whisper_loss=0.0916, over 3910145.70 frames. ], batch size: 70, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:39:21,290 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 04:39:56,037 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 04:40:01,133 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 04:40:04,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1995190.0, ans=0.1 2024-08-13 04:40:07,897 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 04:40:14,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1995290.0, ans=0.2 2024-08-13 04:40:23,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.346e+01 2.633e+01 2.953e+01 4.555e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-13 04:40:24,504 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 04:40:32,486 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 04:40:48,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1995390.0, ans=0.125 2024-08-13 04:40:52,082 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11150, loss[loss=0.1069, beats_loss=0.009587, ecapa_loss=0.0001721, whisper_loss=0.0956, over 22337.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01074, ecapa_loss=0.0001672, whisper_loss=0.09222, over 3900421.38 frames. ], batch size: 88, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:41:19,756 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 20 from LS+wenet, 31 from Vox, 43 fro AS 2024-08-13 04:41:24,484 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 04:42:01,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1995790.0, ans=0.0 2024-08-13 04:42:28,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2024-08-13 04:42:29,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1995890.0, ans=0.1 2024-08-13 04:42:40,875 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11200, loss[loss=0.1183, beats_loss=0.009776, ecapa_loss=0.0002036, whisper_loss=0.1065, over 21734.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01073, ecapa_loss=0.0001676, whisper_loss=0.09188, over 3871260.21 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:42:47,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=1995990.0, ans=6.0 2024-08-13 04:43:51,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1996190.0, ans=0.09899494936611666 2024-08-13 04:44:12,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.527e+01 2.790e+01 3.048e+01 4.600e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 04:44:13,347 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 18 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 04:44:18,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1996390.0, ans=0.2 2024-08-13 04:44:32,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-13 04:44:34,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-13 04:44:47,876 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11250, loss[loss=0.07573, beats_loss=0.01133, ecapa_loss=0.0002069, whisper_loss=0.06233, over 14424.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01077, ecapa_loss=0.0001672, whisper_loss=0.09198, over 3858664.57 frames. ], batch size: 61, lr: 4.45e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:45:49,803 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 20 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 04:45:54,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1996690.0, ans=0.125 2024-08-13 04:46:06,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-13 04:46:31,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1996890.0, ans=0.125 2024-08-13 04:46:37,950 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-13 04:46:47,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1996890.0, ans=0.2 2024-08-13 04:46:52,342 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11300, loss[loss=0.1177, beats_loss=0.009637, ecapa_loss=0.0002162, whisper_loss=0.1059, over 19733.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001672, whisper_loss=0.09177, over 3859868.32 frames. ], batch size: 83, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:47:05,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1996990.0, ans=0.125 2024-08-13 04:47:16,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1997090.0, ans=0.1 2024-08-13 04:47:18,760 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 04:47:21,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1997090.0, ans=0.1 2024-08-13 04:47:26,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1997090.0, ans=0.125 2024-08-13 04:47:28,322 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 04:48:05,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1997190.0, ans=0.125 2024-08-13 04:48:21,354 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 04:48:21,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1997290.0, ans=0.0 2024-08-13 04:48:26,485 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 04:48:27,578 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+01 2.451e+01 2.765e+01 3.179e+01 5.185e+01, threshold=5.530e+01, percent-clipped=0.0 2024-08-13 04:48:39,887 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 04:48:42,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1997390.0, ans=0.125 2024-08-13 04:48:54,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1997390.0, ans=0.125 2024-08-13 04:48:59,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11350, loss[loss=0.1095, beats_loss=0.008785, ecapa_loss=0.0001823, whisper_loss=0.09892, over 17319.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01079, ecapa_loss=0.0001659, whisper_loss=0.09191, over 3889759.92 frames. ], batch size: 69, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:49:10,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1997490.0, ans=0.0 2024-08-13 04:49:26,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1997590.0, ans=0.125 2024-08-13 04:49:46,861 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 04:49:59,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1997790.0, ans=0.1 2024-08-13 04:50:01,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1997790.0, ans=10.0 2024-08-13 04:50:12,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1997890.0, ans=0.05 2024-08-13 04:50:21,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1997890.0, ans=0.125 2024-08-13 04:50:29,999 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11400, loss[loss=0.1094, beats_loss=0.01131, ecapa_loss=0.000153, whisper_loss=0.09654, over 22222.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01079, ecapa_loss=0.0001673, whisper_loss=0.09246, over 3883730.98 frames. ], batch size: 88, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:50:46,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1998090.0, ans=0.125 2024-08-13 04:50:54,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1998090.0, ans=0.125 2024-08-13 04:51:17,024 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 04:51:39,732 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.469e+01 2.790e+01 3.072e+01 4.491e+01, threshold=5.580e+01, percent-clipped=0.0 2024-08-13 04:52:03,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11450, loss[loss=0.0907, beats_loss=0.01177, ecapa_loss=0.0001607, whisper_loss=0.07733, over 21880.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01093, ecapa_loss=0.0001672, whisper_loss=0.09124, over 3861412.04 frames. ], batch size: 89, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:52:20,525 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 18 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-13 04:52:44,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-13 04:52:46,615 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:52:52,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=1998690.0, ans=0.125 2024-08-13 04:52:58,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1998790.0, ans=0.07 2024-08-13 04:52:58,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2024-08-13 04:52:59,371 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 04:53:01,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1998790.0, ans=0.1 2024-08-13 04:53:04,684 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 04:53:11,729 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:53:20,574 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 35 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 04:53:22,948 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 04:53:33,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1998890.0, ans=0.125 2024-08-13 04:53:38,119 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11500, loss[loss=0.102, beats_loss=0.01002, ecapa_loss=0.0001848, whisper_loss=0.09013, over 16827.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0108, ecapa_loss=0.0001676, whisper_loss=0.09251, over 3895700.46 frames. ], batch size: 69, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:53:55,270 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 04:53:57,098 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 21 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 04:53:57,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1999090.0, ans=0.2 2024-08-13 04:53:57,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=12.0 2024-08-13 04:54:19,042 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 22 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-13 04:54:28,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1999190.0, ans=0.0 2024-08-13 04:54:33,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1999290.0, ans=0.0 2024-08-13 04:54:39,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1999290.0, ans=0.0 2024-08-13 04:54:45,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.530e+01 2.837e+01 3.156e+01 6.576e+01, threshold=5.675e+01, percent-clipped=1.0 2024-08-13 04:54:47,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1999290.0, ans=0.0 2024-08-13 04:54:55,529 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 04:55:05,165 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 04:55:06,273 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 18 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 04:55:07,895 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11550, loss[loss=0.087, beats_loss=0.01322, ecapa_loss=0.0001647, whisper_loss=0.07213, over 18810.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001674, whisper_loss=0.09203, over 3900811.08 frames. ], batch size: 79, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:55:24,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1999590.0, ans=0.125 2024-08-13 04:55:39,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1999590.0, ans=0.125 2024-08-13 04:55:49,785 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-13 04:55:50,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1999690.0, ans=0.1 2024-08-13 04:55:51,709 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 29 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 04:55:58,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1999690.0, ans=0.125 2024-08-13 04:56:10,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1999790.0, ans=0.125 2024-08-13 04:56:21,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1999890.0, ans=0.125 2024-08-13 04:56:40,456 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-200000.pt 2024-08-13 04:56:43,776 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11600, loss[loss=0.09989, beats_loss=0.01206, ecapa_loss=0.0001651, whisper_loss=0.08618, over 23150.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001675, whisper_loss=0.09145, over 3923913.65 frames. ], batch size: 95, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:56:44,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1999990.0, ans=0.1 2024-08-13 04:56:47,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.72 vs. limit=10.0 2024-08-13 04:57:04,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-08-13 04:57:15,869 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 04:57:41,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2000290.0, ans=0.1 2024-08-13 04:57:56,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.423e+01 2.636e+01 2.832e+01 7.836e+01, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 04:58:02,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2000390.0, ans=0.125 2024-08-13 04:58:06,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2000390.0, ans=0.125 2024-08-13 04:58:12,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2000390.0, ans=0.0 2024-08-13 04:58:18,639 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 04:58:22,169 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11650, loss[loss=0.07046, beats_loss=0.01131, ecapa_loss=0.0001718, whisper_loss=0.05743, over 17159.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01075, ecapa_loss=0.0001674, whisper_loss=0.09182, over 3903456.11 frames. ], batch size: 71, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 04:58:30,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2000490.0, ans=0.0 2024-08-13 04:58:41,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-08-13 04:58:47,459 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-13 04:58:48,979 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 04:59:11,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2000690.0, ans=0.125 2024-08-13 04:59:29,399 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 04:59:37,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.83 vs. limit=10.0 2024-08-13 04:59:56,918 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11700, loss[loss=0.1024, beats_loss=0.01232, ecapa_loss=0.0001463, whisper_loss=0.08865, over 21071.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001668, whisper_loss=0.09154, over 3930352.28 frames. ], batch size: 83, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:00:04,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2000990.0, ans=0.125 2024-08-13 05:00:06,253 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 05:00:06,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2000990.0, ans=0.0 2024-08-13 05:00:32,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2001090.0, ans=0.0 2024-08-13 05:00:35,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2001190.0, ans=0.0 2024-08-13 05:00:37,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2001190.0, ans=0.125 2024-08-13 05:00:40,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2001190.0, ans=0.125 2024-08-13 05:00:43,691 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:01:01,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-13 05:01:07,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.358e+01 2.707e+01 3.132e+01 5.516e+01, threshold=5.414e+01, percent-clipped=1.0 2024-08-13 05:01:28,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2001490.0, ans=0.125 2024-08-13 05:01:30,499 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11750, loss[loss=0.1193, beats_loss=0.01006, ecapa_loss=0.0001311, whisper_loss=0.1079, over 24172.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01098, ecapa_loss=0.0001657, whisper_loss=0.09155, over 3957423.77 frames. ], batch size: 90, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:01:34,439 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 05:01:49,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-13 05:01:51,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2001590.0, ans=0.95 2024-08-13 05:01:56,013 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 05:02:04,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2001590.0, ans=0.125 2024-08-13 05:02:05,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2001690.0, ans=0.0 2024-08-13 05:02:12,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2001690.0, ans=0.125 2024-08-13 05:02:25,199 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 05:02:36,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2001790.0, ans=0.125 2024-08-13 05:02:56,523 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 05:03:01,862 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 05:03:03,247 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11800, loss[loss=0.107, beats_loss=0.01196, ecapa_loss=0.0001315, whisper_loss=0.09368, over 22365.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01093, ecapa_loss=0.0001658, whisper_loss=0.09242, over 3987270.05 frames. ], batch size: 86, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:03:05,063 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-13 05:03:05,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2001990.0, ans=0.125 2024-08-13 05:03:27,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2002090.0, ans=0.125 2024-08-13 05:03:37,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2002190.0, ans=0.5 2024-08-13 05:03:43,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2002190.0, ans=0.125 2024-08-13 05:04:04,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2002290.0, ans=0.0 2024-08-13 05:04:06,128 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+01 2.539e+01 2.830e+01 3.148e+01 9.366e+01, threshold=5.659e+01, percent-clipped=1.0 2024-08-13 05:04:10,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2002390.0, ans=0.1 2024-08-13 05:04:13,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2002390.0, ans=0.5 2024-08-13 05:04:29,038 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11850, loss[loss=0.1032, beats_loss=0.009676, ecapa_loss=0.0001522, whisper_loss=0.09202, over 20155.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01095, ecapa_loss=0.0001664, whisper_loss=0.09184, over 3964598.58 frames. ], batch size: 78, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:04:39,060 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 05:04:41,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-13 05:05:03,513 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 05:05:07,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2002690.0, ans=0.0 2024-08-13 05:05:07,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2002690.0, ans=0.0 2024-08-13 05:05:12,766 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 39 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 05:05:20,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2024-08-13 05:05:30,085 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-13 05:05:31,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2024-08-13 05:05:37,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2002790.0, ans=0.125 2024-08-13 05:05:48,335 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-13 05:05:57,508 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11900, loss[loss=0.1036, beats_loss=0.0115, ecapa_loss=0.0001525, whisper_loss=0.09061, over 22558.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01095, ecapa_loss=0.0001673, whisper_loss=0.09258, over 3989221.32 frames. ], batch size: 92, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:06:06,054 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 05:06:26,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2003090.0, ans=0.125 2024-08-13 05:06:34,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.99 vs. limit=6.0 2024-08-13 05:06:44,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2003190.0, ans=0.125 2024-08-13 05:06:45,688 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-13 05:06:47,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2003290.0, ans=0.125 2024-08-13 05:07:00,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.520e+01 2.675e+01 3.005e+01 5.998e+01, threshold=5.349e+01, percent-clipped=1.0 2024-08-13 05:07:03,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=2003290.0, ans=12.0 2024-08-13 05:07:21,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2003390.0, ans=0.125 2024-08-13 05:07:23,697 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 11950, loss[loss=0.1185, beats_loss=0.01156, ecapa_loss=0.0001851, whisper_loss=0.1051, over 16051.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01091, ecapa_loss=0.0001676, whisper_loss=0.09197, over 3938445.12 frames. ], batch size: 69, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:07:50,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-13 05:07:55,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2003590.0, ans=0.125 2024-08-13 05:07:57,574 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 05:07:58,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-08-13 05:07:59,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2003690.0, ans=0.0 2024-08-13 05:08:06,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2024-08-13 05:08:18,556 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-13 05:08:18,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2003790.0, ans=0.2 2024-08-13 05:08:22,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2003790.0, ans=0.125 2024-08-13 05:08:27,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2003790.0, ans=0.0 2024-08-13 05:08:34,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 05:08:40,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2003890.0, ans=0.0 2024-08-13 05:08:49,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2003990.0, ans=0.125 2024-08-13 05:08:50,053 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12000, loss[loss=0.09662, beats_loss=0.01155, ecapa_loss=0.0001453, whisper_loss=0.08361, over 21974.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001679, whisper_loss=0.09189, over 3928497.73 frames. ], batch size: 86, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:08:50,055 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 05:09:29,197 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on ASR_libri: loss=0.2526, beats_loss=0, ecapa_loss=0.0005731, whisper_loss=0.2468, over 922467.00 frames. 2024-08-13 05:09:48,310 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on SV_voxceleb1: loss=0.004602, beats_loss=0, ecapa_loss=0.0004602, whisper_loss=0, over 939242.00 frames. 2024-08-13 05:11:41,002 INFO [train_multi_KD3.py:1149] (0/4) Epoch 14, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 05:11:41,006 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 05:11:52,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2003990.0, ans=0.125 2024-08-13 05:12:07,261 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 05:12:07,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2004090.0, ans=0.0 2024-08-13 05:12:07,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2004090.0, ans=0.2 2024-08-13 05:12:33,335 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-13 05:12:43,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.475e+01 2.665e+01 3.111e+01 1.048e+02, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 05:12:54,893 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 05:13:04,536 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12050, loss[loss=0.09357, beats_loss=0.01167, ecapa_loss=0.0001985, whisper_loss=0.07991, over 14969.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01091, ecapa_loss=0.0001693, whisper_loss=0.09114, over 3868790.91 frames. ], batch size: 63, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:13:11,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2004490.0, ans=0.0 2024-08-13 05:13:16,601 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.670e-03 2024-08-13 05:13:26,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2004590.0, ans=0.125 2024-08-13 05:13:33,611 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-13 05:13:55,138 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-13 05:14:03,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2004790.0, ans=0.0 2024-08-13 05:14:10,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2004890.0, ans=0.125 2024-08-13 05:14:20,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2004890.0, ans=0.1 2024-08-13 05:14:28,960 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12100, loss[loss=0.1057, beats_loss=0.009523, ecapa_loss=0.0002312, whisper_loss=0.0939, over 21510.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01092, ecapa_loss=0.0001699, whisper_loss=0.09077, over 3847338.54 frames. ], batch size: 92, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:14:32,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2004990.0, ans=0.2 2024-08-13 05:14:47,139 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-13 05:15:19,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2005290.0, ans=0.0 2024-08-13 05:15:27,556 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 05:15:31,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.461e+01 2.696e+01 3.254e+01 5.243e+01, threshold=5.392e+01, percent-clipped=0.0 2024-08-13 05:15:34,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2005390.0, ans=0.1 2024-08-13 05:15:49,356 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 26 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 05:15:52,070 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12150, loss[loss=0.1, beats_loss=0.01127, ecapa_loss=0.0001718, whisper_loss=0.08702, over 21700.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01101, ecapa_loss=0.0001683, whisper_loss=0.09013, over 3845608.63 frames. ], batch size: 90, lr: 4.44e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:15:52,196 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 05:16:12,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-08-13 05:16:27,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2005690.0, ans=0.0 2024-08-13 05:16:32,423 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 05:16:40,964 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 05:17:07,182 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 05:17:08,594 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 05:17:17,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12200, loss[loss=0.1204, beats_loss=0.009171, ecapa_loss=0.0001424, whisper_loss=0.1098, over 23361.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01103, ecapa_loss=0.0001669, whisper_loss=0.09033, over 3841518.83 frames. ], batch size: 89, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:17:17,624 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-13 05:17:47,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2006090.0, ans=0.125 2024-08-13 05:17:48,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2006090.0, ans=0.125 2024-08-13 05:18:21,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.467e+01 2.824e+01 3.197e+01 4.821e+01, threshold=5.649e+01, percent-clipped=0.0 2024-08-13 05:18:26,830 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-13 05:18:42,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12250, loss[loss=0.08787, beats_loss=0.01479, ecapa_loss=0.0001282, whisper_loss=0.0718, over 21755.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01099, ecapa_loss=0.0001672, whisper_loss=0.0905, over 3862113.39 frames. ], batch size: 89, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:18:44,661 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 05:18:55,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2024-08-13 05:19:15,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2006690.0, ans=0.1 2024-08-13 05:19:42,791 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 05:19:55,933 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 05:19:56,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2006890.0, ans=0.125 2024-08-13 05:20:01,093 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 05:20:01,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2006890.0, ans=0.0 2024-08-13 05:20:01,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2006890.0, ans=0.0 2024-08-13 05:20:02,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2006890.0, ans=0.125 2024-08-13 05:20:04,691 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12300, loss[loss=0.114, beats_loss=0.01022, ecapa_loss=0.0002012, whisper_loss=0.1018, over 21562.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01092, ecapa_loss=0.0001676, whisper_loss=0.09113, over 3866023.43 frames. ], batch size: 88, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:20:13,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2006990.0, ans=0.05 2024-08-13 05:20:14,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.31 vs. limit=10.0 2024-08-13 05:20:22,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-13 05:20:50,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2007190.0, ans=0.125 2024-08-13 05:21:06,650 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.461e+01 2.771e+01 3.048e+01 4.529e+01, threshold=5.542e+01, percent-clipped=0.0 2024-08-13 05:21:10,223 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 05:21:21,112 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 05:21:22,631 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 05:21:30,678 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12350, loss[loss=0.1058, beats_loss=0.009152, ecapa_loss=0.0002098, whisper_loss=0.09458, over 13059.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01093, ecapa_loss=0.0001676, whisper_loss=0.09128, over 3870837.41 frames. ], batch size: 53, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:22:00,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-08-13 05:22:01,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2007590.0, ans=0.1 2024-08-13 05:22:03,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-13 05:22:07,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2007690.0, ans=0.0 2024-08-13 05:22:16,533 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 05:22:23,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2007790.0, ans=0.125 2024-08-13 05:22:33,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2007790.0, ans=0.125 2024-08-13 05:22:34,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2007790.0, ans=0.0 2024-08-13 05:22:37,028 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-13 05:22:45,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2007890.0, ans=0.125 2024-08-13 05:22:47,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2007890.0, ans=0.1 2024-08-13 05:22:53,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2007890.0, ans=0.125 2024-08-13 05:22:55,729 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12400, loss[loss=0.1066, beats_loss=0.0097, ecapa_loss=0.000165, whisper_loss=0.09526, over 19371.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001674, whisper_loss=0.09155, over 3896126.70 frames. ], batch size: 75, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:23:03,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2007990.0, ans=0.125 2024-08-13 05:23:23,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2008090.0, ans=0.125 2024-08-13 05:23:28,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2008190.0, ans=0.07 2024-08-13 05:23:39,335 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 19 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 05:23:43,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.26 vs. limit=6.0 2024-08-13 05:23:51,876 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 05:23:59,578 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.499e+01 2.802e+01 3.094e+01 1.002e+02, threshold=5.604e+01, percent-clipped=2.0 2024-08-13 05:24:10,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2008390.0, ans=0.1 2024-08-13 05:24:22,319 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12450, loss[loss=0.1041, beats_loss=0.008275, ecapa_loss=0.0001575, whisper_loss=0.0942, over 14319.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01082, ecapa_loss=0.0001692, whisper_loss=0.09134, over 3890238.87 frames. ], batch size: 54, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:24:27,876 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 05:24:32,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2008490.0, ans=0.0 2024-08-13 05:24:49,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2008590.0, ans=0.125 2024-08-13 05:24:51,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2008590.0, ans=0.0 2024-08-13 05:25:09,416 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 05:25:24,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2008790.0, ans=0.125 2024-08-13 05:25:37,031 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 05:25:50,530 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12500, loss[loss=0.1065, beats_loss=0.01035, ecapa_loss=0.0001897, whisper_loss=0.09422, over 20813.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01077, ecapa_loss=0.0001698, whisper_loss=0.09142, over 3873556.52 frames. ], batch size: 84, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:25:52,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2008990.0, ans=0.1 2024-08-13 05:26:38,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2009190.0, ans=0.125 2024-08-13 05:26:39,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2009290.0, ans=0.0 2024-08-13 05:26:40,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2009290.0, ans=0.125 2024-08-13 05:26:51,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-13 05:26:51,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.389e+01 2.676e+01 3.149e+01 9.586e+01, threshold=5.353e+01, percent-clipped=2.0 2024-08-13 05:27:04,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2009390.0, ans=0.0 2024-08-13 05:27:14,367 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12550, loss[loss=0.1081, beats_loss=0.007511, ecapa_loss=0.0001278, whisper_loss=0.09931, over 14974.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001693, whisper_loss=0.09156, over 3907972.52 frames. ], batch size: 54, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:27:22,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2009490.0, ans=0.1 2024-08-13 05:27:40,564 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 05:27:44,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2009590.0, ans=0.0 2024-08-13 05:28:14,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.04 vs. limit=10.0 2024-08-13 05:28:17,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2009790.0, ans=0.0 2024-08-13 05:28:17,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2009790.0, ans=0.2 2024-08-13 05:28:18,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2009890.0, ans=0.2 2024-08-13 05:28:25,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2009890.0, ans=0.125 2024-08-13 05:28:34,618 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-13 05:28:36,017 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12600, loss[loss=0.09532, beats_loss=0.01196, ecapa_loss=0.0001247, whisper_loss=0.08211, over 22323.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01089, ecapa_loss=0.0001697, whisper_loss=0.09166, over 3925689.96 frames. ], batch size: 86, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:29:16,809 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 12 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 05:29:23,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2010290.0, ans=0.0 2024-08-13 05:29:31,235 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 05:29:34,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2010290.0, ans=0.2 2024-08-13 05:29:36,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.351e+01 2.664e+01 2.979e+01 4.679e+01, threshold=5.327e+01, percent-clipped=0.0 2024-08-13 05:29:41,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2010390.0, ans=0.09899494936611666 2024-08-13 05:29:42,003 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 05:29:51,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2024-08-13 05:29:57,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12650, loss[loss=0.1047, beats_loss=0.01107, ecapa_loss=0.000161, whisper_loss=0.09198, over 23006.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01096, ecapa_loss=0.0001689, whisper_loss=0.09142, over 3905018.57 frames. ], batch size: 91, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:29:58,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2010490.0, ans=0.1 2024-08-13 05:30:05,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2024-08-13 05:30:09,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2010490.0, ans=0.1 2024-08-13 05:30:11,037 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 05:30:18,838 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 05:30:40,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2010690.0, ans=0.125 2024-08-13 05:30:51,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-13 05:30:52,891 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 05:30:54,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2010790.0, ans=0.125 2024-08-13 05:30:56,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-08-13 05:31:06,492 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 05:31:21,815 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12700, loss[loss=0.1029, beats_loss=0.01191, ecapa_loss=0.0001477, whisper_loss=0.08954, over 13662.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01095, ecapa_loss=0.0001684, whisper_loss=0.09221, over 3901035.81 frames. ], batch size: 53, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:31:36,449 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 05:31:55,914 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 05:32:04,346 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 05:32:10,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2011290.0, ans=0.0 2024-08-13 05:32:14,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2011290.0, ans=0.125 2024-08-13 05:32:21,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.465e+01 2.775e+01 3.008e+01 5.404e+01, threshold=5.550e+01, percent-clipped=1.0 2024-08-13 05:32:42,967 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12750, loss[loss=0.1118, beats_loss=0.01107, ecapa_loss=0.0001726, whisper_loss=0.09901, over 22389.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01105, ecapa_loss=0.0001682, whisper_loss=0.09199, over 3883325.12 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:32:56,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2024-08-13 05:33:32,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2011790.0, ans=0.1 2024-08-13 05:33:48,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2011890.0, ans=0.2 2024-08-13 05:33:54,026 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 05:34:03,851 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12800, loss[loss=0.1107, beats_loss=0.01179, ecapa_loss=0.0001736, whisper_loss=0.09716, over 21790.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01102, ecapa_loss=0.0001683, whisper_loss=0.09189, over 3879191.31 frames. ], batch size: 90, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:34:37,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2012190.0, ans=0.125 2024-08-13 05:34:38,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2012190.0, ans=0.0 2024-08-13 05:34:42,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-08-13 05:34:48,405 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 05:34:59,676 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 05:35:00,018 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.926e-02 2024-08-13 05:35:05,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.426e+01 2.719e+01 3.089e+01 6.356e+01, threshold=5.438e+01, percent-clipped=2.0 2024-08-13 05:35:15,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2024-08-13 05:35:22,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2012390.0, ans=0.07 2024-08-13 05:35:27,231 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12850, loss[loss=0.088, beats_loss=0.01317, ecapa_loss=0.0001448, whisper_loss=0.07338, over 19745.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01107, ecapa_loss=0.0001681, whisper_loss=0.09154, over 3882467.12 frames. ], batch size: 78, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:35:39,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2012490.0, ans=0.05 2024-08-13 05:35:39,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2012490.0, ans=0.05 2024-08-13 05:36:05,994 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 05:36:06,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2012690.0, ans=0.125 2024-08-13 05:36:41,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2012890.0, ans=0.0 2024-08-13 05:36:43,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2012890.0, ans=0.125 2024-08-13 05:36:47,383 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12900, loss[loss=0.1007, beats_loss=0.009756, ecapa_loss=0.0002267, whisper_loss=0.0887, over 20069.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01097, ecapa_loss=0.000169, whisper_loss=0.09131, over 3855074.31 frames. ], batch size: 89, lr: 4.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 05:36:59,139 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-13 05:37:05,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2013090.0, ans=0.0 2024-08-13 05:37:08,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2013090.0, ans=0.0 2024-08-13 05:37:13,141 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 05:37:14,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-08-13 05:37:14,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2013090.0, ans=0.125 2024-08-13 05:37:19,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2013190.0, ans=0.125 2024-08-13 05:37:20,567 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 05:37:22,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-13 05:37:23,519 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 05:37:37,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2013290.0, ans=0.0 2024-08-13 05:37:44,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.357e+01 2.603e+01 2.918e+01 4.145e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-13 05:37:57,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2013390.0, ans=0.0 2024-08-13 05:38:01,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2013390.0, ans=0.0 2024-08-13 05:38:03,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2013390.0, ans=0.1 2024-08-13 05:38:04,332 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 05:38:07,036 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 12950, loss[loss=0.07367, beats_loss=0.0106, ecapa_loss=0.0001984, whisper_loss=0.06109, over 20280.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01092, ecapa_loss=0.0001687, whisper_loss=0.09031, over 3829749.73 frames. ], batch size: 87, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:38:11,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2024-08-13 05:38:15,514 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 16 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-13 05:38:17,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2013490.0, ans=0.125 2024-08-13 05:38:26,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2013590.0, ans=0.2 2024-08-13 05:38:34,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=12.0 2024-08-13 05:38:37,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2013590.0, ans=0.125 2024-08-13 05:38:49,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2024-08-13 05:39:17,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2013890.0, ans=0.0 2024-08-13 05:39:22,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2013890.0, ans=0.125 2024-08-13 05:39:30,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13000, loss[loss=0.08395, beats_loss=0.01131, ecapa_loss=0.0001872, whisper_loss=0.07077, over 17045.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01093, ecapa_loss=0.000168, whisper_loss=0.09048, over 3845207.24 frames. ], batch size: 71, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:39:31,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2024-08-13 05:39:32,720 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-13 05:39:56,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2014090.0, ans=0.125 2024-08-13 05:40:04,703 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-13 05:40:19,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2014290.0, ans=0.025 2024-08-13 05:40:24,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2014290.0, ans=0.0 2024-08-13 05:40:25,567 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 05:40:31,564 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.457e+01 2.798e+01 3.261e+01 6.703e+01, threshold=5.596e+01, percent-clipped=3.0 2024-08-13 05:40:35,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2014390.0, ans=0.025 2024-08-13 05:40:37,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2014390.0, ans=0.2 2024-08-13 05:40:47,908 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 05:40:50,299 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 05:40:52,665 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13050, loss[loss=0.1312, beats_loss=0.009926, ecapa_loss=0.0001646, whisper_loss=0.1196, over 23526.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01087, ecapa_loss=0.0001675, whisper_loss=0.09116, over 3880597.16 frames. ], batch size: 93, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:40:58,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2014490.0, ans=0.125 2024-08-13 05:41:01,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2014490.0, ans=0.125 2024-08-13 05:41:05,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2014490.0, ans=0.125 2024-08-13 05:41:12,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2014590.0, ans=0.0 2024-08-13 05:41:12,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2014590.0, ans=0.125 2024-08-13 05:41:13,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2014590.0, ans=0.07 2024-08-13 05:41:25,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2014690.0, ans=0.125 2024-08-13 05:41:40,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2014790.0, ans=0.2 2024-08-13 05:41:43,672 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 05:41:51,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2014790.0, ans=0.0 2024-08-13 05:41:54,809 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 05:42:12,428 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13100, loss[loss=0.09419, beats_loss=0.01163, ecapa_loss=0.0001598, whisper_loss=0.08097, over 22702.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001667, whisper_loss=0.09058, over 3867005.40 frames. ], batch size: 93, lr: 4.43e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:42:14,020 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 05:42:20,988 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 05:42:22,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2014990.0, ans=0.125 2024-08-13 05:42:32,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2015090.0, ans=0.125 2024-08-13 05:42:39,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2015090.0, ans=0.0 2024-08-13 05:42:56,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2015190.0, ans=0.2 2024-08-13 05:43:08,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=12.0 2024-08-13 05:43:12,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.412e+01 2.747e+01 3.007e+01 5.883e+01, threshold=5.493e+01, percent-clipped=1.0 2024-08-13 05:43:15,056 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 05:43:21,213 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 05:43:31,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2015390.0, ans=0.125 2024-08-13 05:43:33,698 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13150, loss[loss=0.1091, beats_loss=0.009427, ecapa_loss=0.0001627, whisper_loss=0.09805, over 18054.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01087, ecapa_loss=0.0001665, whisper_loss=0.09111, over 3853638.16 frames. ], batch size: 71, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:43:33,933 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-13 05:43:37,180 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 05:43:43,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=15.0 2024-08-13 05:43:50,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2015590.0, ans=0.125 2024-08-13 05:44:00,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=22.5 2024-08-13 05:44:03,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2015590.0, ans=0.125 2024-08-13 05:44:09,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2015690.0, ans=0.125 2024-08-13 05:44:12,293 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 33 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 05:44:12,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2015690.0, ans=0.1 2024-08-13 05:44:17,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2015690.0, ans=0.0 2024-08-13 05:44:18,977 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 05:44:20,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2015790.0, ans=0.1 2024-08-13 05:44:23,812 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 05:44:37,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2015890.0, ans=0.125 2024-08-13 05:44:54,127 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13200, loss[loss=0.1122, beats_loss=0.008464, ecapa_loss=0.0001688, whisper_loss=0.1021, over 17350.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0001661, whisper_loss=0.09104, over 3876413.95 frames. ], batch size: 64, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:45:02,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2015990.0, ans=0.125 2024-08-13 05:45:02,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2015990.0, ans=0.125 2024-08-13 05:45:16,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2016090.0, ans=0.2 2024-08-13 05:45:36,147 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 05:45:45,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2016290.0, ans=0.0 2024-08-13 05:45:53,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.423e+01 2.725e+01 2.981e+01 4.895e+01, threshold=5.450e+01, percent-clipped=0.0 2024-08-13 05:45:59,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2016390.0, ans=0.0 2024-08-13 05:46:06,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2016390.0, ans=0.125 2024-08-13 05:46:14,910 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13250, loss[loss=0.1109, beats_loss=0.01028, ecapa_loss=0.0001945, whisper_loss=0.09871, over 18093.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01089, ecapa_loss=0.0001671, whisper_loss=0.09094, over 3847898.80 frames. ], batch size: 77, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:46:15,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2016490.0, ans=0.0 2024-08-13 05:46:20,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2016490.0, ans=0.0 2024-08-13 05:46:30,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2016590.0, ans=0.1 2024-08-13 05:46:32,102 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 05:46:36,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2016590.0, ans=0.0 2024-08-13 05:47:25,686 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 19 from LS+wenet, 27 from Vox, 49 fro AS 2024-08-13 05:47:40,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2016990.0, ans=0.015 2024-08-13 05:47:41,110 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13300, loss[loss=0.1005, beats_loss=0.01032, ecapa_loss=0.0001672, whisper_loss=0.08847, over 17074.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001665, whisper_loss=0.09059, over 3816408.07 frames. ], batch size: 68, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:47:56,531 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 35 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 05:48:42,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.445e+01 2.718e+01 3.162e+01 4.686e+01, threshold=5.435e+01, percent-clipped=0.0 2024-08-13 05:48:43,194 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 05:48:48,570 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.328e+05 2024-08-13 05:48:56,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2017390.0, ans=0.1 2024-08-13 05:49:01,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2017390.0, ans=0.125 2024-08-13 05:49:03,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13350, loss[loss=0.1012, beats_loss=0.01091, ecapa_loss=0.0001543, whisper_loss=0.08871, over 23477.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001656, whisper_loss=0.09125, over 3837058.55 frames. ], batch size: 92, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:49:18,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2017490.0, ans=0.125 2024-08-13 05:49:20,283 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.637e-02 2024-08-13 05:49:21,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2017590.0, ans=0.0 2024-08-13 05:49:29,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2017590.0, ans=0.0 2024-08-13 05:49:38,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2017690.0, ans=0.125 2024-08-13 05:49:42,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2017690.0, ans=0.125 2024-08-13 05:49:44,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2017690.0, ans=0.125 2024-08-13 05:49:46,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2024-08-13 05:50:02,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2017790.0, ans=0.2 2024-08-13 05:50:26,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13400, loss[loss=0.08717, beats_loss=0.01302, ecapa_loss=0.0001405, whisper_loss=0.07274, over 13435.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01091, ecapa_loss=0.0001662, whisper_loss=0.09118, over 3856878.87 frames. ], batch size: 54, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:50:39,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2017990.0, ans=0.125 2024-08-13 05:50:43,196 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 05:51:02,211 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 05:51:03,785 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-13 05:51:07,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2018190.0, ans=0.1 2024-08-13 05:51:12,108 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 05:51:17,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2018290.0, ans=0.2 2024-08-13 05:51:20,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2018290.0, ans=0.125 2024-08-13 05:51:22,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2018290.0, ans=0.07 2024-08-13 05:51:26,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2018290.0, ans=0.2 2024-08-13 05:51:28,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.489e+01 2.760e+01 3.071e+01 5.716e+01, threshold=5.519e+01, percent-clipped=1.0 2024-08-13 05:51:39,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2018390.0, ans=0.1 2024-08-13 05:51:39,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2018390.0, ans=0.0 2024-08-13 05:51:49,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2018490.0, ans=0.125 2024-08-13 05:51:50,199 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13450, loss[loss=0.1042, beats_loss=0.009945, ecapa_loss=0.0001543, whisper_loss=0.09274, over 16172.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0109, ecapa_loss=0.0001656, whisper_loss=0.09115, over 3838712.78 frames. ], batch size: 62, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:52:11,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2018590.0, ans=0.125 2024-08-13 05:52:31,042 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-13 05:52:33,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2018690.0, ans=0.0 2024-08-13 05:53:05,294 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 05:53:08,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2018890.0, ans=0.125 2024-08-13 05:53:14,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13500, loss[loss=0.09712, beats_loss=0.01349, ecapa_loss=0.0001754, whisper_loss=0.08187, over 20329.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01082, ecapa_loss=0.0001665, whisper_loss=0.09191, over 3836892.14 frames. ], batch size: 88, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:53:16,893 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.045e+01 2024-08-13 05:53:23,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2018990.0, ans=0.1 2024-08-13 05:53:24,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2018990.0, ans=0.2 2024-08-13 05:53:28,068 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 05:53:30,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2024-08-13 05:53:31,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2019090.0, ans=0.125 2024-08-13 05:53:41,796 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 05:53:46,554 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 05:53:52,614 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-13 05:54:17,722 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.519e+01 2.845e+01 3.228e+01 5.669e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 05:54:18,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2024-08-13 05:54:26,108 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 05:54:39,049 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13550, loss[loss=0.07951, beats_loss=0.01444, ecapa_loss=0.0001429, whisper_loss=0.06364, over 21917.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001654, whisper_loss=0.09147, over 3862973.46 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:54:45,883 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-13 05:54:57,439 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 05:55:25,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2019690.0, ans=0.0 2024-08-13 05:55:35,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2019790.0, ans=0.125 2024-08-13 05:55:53,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2019890.0, ans=0.0 2024-08-13 05:56:02,799 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13600, loss[loss=0.09656, beats_loss=0.01296, ecapa_loss=0.0001531, whisper_loss=0.08207, over 21326.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01092, ecapa_loss=0.0001656, whisper_loss=0.09199, over 3875115.87 frames. ], batch size: 86, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:56:11,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2019990.0, ans=0.2 2024-08-13 05:56:19,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2020090.0, ans=0.125 2024-08-13 05:56:27,891 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 05:56:57,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2020290.0, ans=0.1 2024-08-13 05:57:00,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2020290.0, ans=0.125 2024-08-13 05:57:03,763 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+01 2.439e+01 2.789e+01 3.158e+01 4.809e+01, threshold=5.578e+01, percent-clipped=0.0 2024-08-13 05:57:25,489 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13650, loss[loss=0.09383, beats_loss=0.01057, ecapa_loss=0.0001556, whisper_loss=0.08171, over 16861.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001675, whisper_loss=0.09195, over 3895168.95 frames. ], batch size: 65, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:57:28,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-13 05:57:37,209 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 05:57:50,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2020590.0, ans=0.0 2024-08-13 05:57:58,076 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-13 05:58:02,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2020690.0, ans=0.125 2024-08-13 05:58:17,383 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 05:58:35,802 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 05:58:45,112 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13700, loss[loss=0.09587, beats_loss=0.01079, ecapa_loss=0.0001599, whisper_loss=0.08348, over 20659.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.0109, ecapa_loss=0.0001675, whisper_loss=0.09234, over 3912326.76 frames. ], batch size: 82, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 05:58:46,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2020990.0, ans=0.125 2024-08-13 05:58:54,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2020990.0, ans=0.125 2024-08-13 05:58:59,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2021090.0, ans=0.125 2024-08-13 05:59:03,867 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 05:59:18,191 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-13 05:59:18,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2021190.0, ans=0.0 2024-08-13 05:59:24,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2021190.0, ans=0.95 2024-08-13 05:59:28,786 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 20 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 05:59:30,495 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 05:59:40,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.485e+01 2.717e+01 3.143e+01 5.833e+01, threshold=5.434e+01, percent-clipped=2.0 2024-08-13 05:59:53,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-13 05:59:58,602 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13750, loss[loss=0.1002, beats_loss=0.01167, ecapa_loss=0.0001699, whisper_loss=0.08686, over 21755.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01089, ecapa_loss=0.0001666, whisper_loss=0.09257, over 3911330.42 frames. ], batch size: 90, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:00:07,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-08-13 06:00:27,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2021690.0, ans=0.015 2024-08-13 06:00:27,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2021690.0, ans=0.0 2024-08-13 06:00:35,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2021690.0, ans=0.0 2024-08-13 06:00:39,062 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 06:00:48,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2021790.0, ans=0.125 2024-08-13 06:00:48,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-13 06:01:07,477 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13800, loss[loss=0.08238, beats_loss=0.008367, ecapa_loss=0.0001817, whisper_loss=0.07219, over 14710.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01084, ecapa_loss=0.0001659, whisper_loss=0.09242, over 3914409.09 frames. ], batch size: 60, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:01:11,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2021990.0, ans=0.2 2024-08-13 06:01:13,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2021990.0, ans=0.125 2024-08-13 06:01:15,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-08-13 06:01:24,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2022090.0, ans=0.1 2024-08-13 06:01:44,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2022190.0, ans=0.125 2024-08-13 06:01:44,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2024-08-13 06:01:55,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2022290.0, ans=0.1 2024-08-13 06:01:57,712 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.404e+01 2.696e+01 2.984e+01 4.554e+01, threshold=5.391e+01, percent-clipped=0.0 2024-08-13 06:02:02,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2022390.0, ans=0.125 2024-08-13 06:02:02,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2022390.0, ans=0.025 2024-08-13 06:02:11,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-08-13 06:02:15,270 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13850, loss[loss=0.1258, beats_loss=0.00885, ecapa_loss=0.0001647, whisper_loss=0.1153, over 16774.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01081, ecapa_loss=0.0001654, whisper_loss=0.09294, over 3901843.42 frames. ], batch size: 63, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:02:28,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2022590.0, ans=6.0 2024-08-13 06:02:32,476 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.112e+01 2024-08-13 06:02:41,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2022590.0, ans=0.07 2024-08-13 06:02:51,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2022690.0, ans=0.0 2024-08-13 06:02:56,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-13 06:03:15,133 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 06:03:24,522 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13900, loss[loss=0.098, beats_loss=0.01236, ecapa_loss=0.0001759, whisper_loss=0.08388, over 20729.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01076, ecapa_loss=0.0001662, whisper_loss=0.09343, over 3911514.23 frames. ], batch size: 87, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:03:25,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2022990.0, ans=0.07 2024-08-13 06:03:32,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2024-08-13 06:03:33,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2022990.0, ans=0.0 2024-08-13 06:03:37,145 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 06:03:44,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=12.0 2024-08-13 06:04:04,725 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 06:04:11,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-08-13 06:04:15,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.449e+01 2.734e+01 3.123e+01 1.484e+02, threshold=5.468e+01, percent-clipped=1.0 2024-08-13 06:04:22,777 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 06:04:28,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2023390.0, ans=0.125 2024-08-13 06:04:33,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 13950, loss[loss=0.1034, beats_loss=0.01149, ecapa_loss=0.000147, whisper_loss=0.0904, over 22706.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01072, ecapa_loss=0.0001665, whisper_loss=0.09329, over 3896439.91 frames. ], batch size: 93, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:04:40,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2023490.0, ans=0.0 2024-08-13 06:04:43,768 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 32 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-13 06:04:45,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2023490.0, ans=0.0 2024-08-13 06:04:49,293 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:04:58,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2023590.0, ans=0.125 2024-08-13 06:05:11,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2023690.0, ans=0.125 2024-08-13 06:05:19,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2023790.0, ans=0.1 2024-08-13 06:05:24,303 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 06:05:28,100 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 06:05:32,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2023890.0, ans=0.125 2024-08-13 06:05:34,878 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-13 06:05:35,143 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:05:36,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2023890.0, ans=10.0 2024-08-13 06:05:41,735 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14000, loss[loss=0.08889, beats_loss=0.01169, ecapa_loss=0.0001873, whisper_loss=0.07532, over 21482.00 frames. ], tot_loss[loss=0.1057, beats_loss=0.01077, ecapa_loss=0.0001661, whisper_loss=0.09329, over 3903335.92 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:05:54,501 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 06:05:57,127 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-13 06:06:13,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2024190.0, ans=0.125 2024-08-13 06:06:26,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2024290.0, ans=0.125 2024-08-13 06:06:32,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.439e+01 2.688e+01 3.210e+01 4.383e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 06:06:39,770 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:06:50,533 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14050, loss[loss=0.1269, beats_loss=0.008925, ecapa_loss=0.0001441, whisper_loss=0.1165, over 17500.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01076, ecapa_loss=0.0001661, whisper_loss=0.09304, over 3891083.26 frames. ], batch size: 64, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:06:52,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2024490.0, ans=0.125 2024-08-13 06:07:00,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2024490.0, ans=0.125 2024-08-13 06:07:22,421 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 06:07:30,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2024790.0, ans=0.2 2024-08-13 06:07:40,733 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 06:07:53,172 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 06:07:55,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2024890.0, ans=0.0 2024-08-13 06:07:59,613 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14100, loss[loss=0.1031, beats_loss=0.01229, ecapa_loss=0.0001703, whisper_loss=0.08906, over 21753.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01083, ecapa_loss=0.0001655, whisper_loss=0.09275, over 3862873.14 frames. ], batch size: 89, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:08:01,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2024990.0, ans=0.2 2024-08-13 06:08:12,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-13 06:08:15,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2025090.0, ans=0.125 2024-08-13 06:08:18,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2024-08-13 06:08:23,957 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 06:08:24,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2025090.0, ans=0.125 2024-08-13 06:08:36,195 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 06:08:36,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2025190.0, ans=0.0 2024-08-13 06:08:40,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-13 06:08:45,646 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 06:08:51,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+01 2.495e+01 2.684e+01 2.972e+01 8.600e+01, threshold=5.367e+01, percent-clipped=1.0 2024-08-13 06:08:57,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2025390.0, ans=0.2 2024-08-13 06:09:01,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-13 06:09:09,031 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14150, loss[loss=0.09456, beats_loss=0.01064, ecapa_loss=0.0001461, whisper_loss=0.08247, over 17195.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01087, ecapa_loss=0.000164, whisper_loss=0.09245, over 3858201.28 frames. ], batch size: 67, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:09:13,583 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:09:16,257 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 06:09:22,798 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 06:09:37,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2025690.0, ans=0.0 2024-08-13 06:09:50,409 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 06:10:02,872 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 06:10:15,018 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 06:10:17,687 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14200, loss[loss=0.1017, beats_loss=0.01099, ecapa_loss=0.0001612, whisper_loss=0.08915, over 22507.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01082, ecapa_loss=0.0001636, whisper_loss=0.09257, over 3855070.75 frames. ], batch size: 89, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:10:23,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2024-08-13 06:10:31,162 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 06:10:43,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2026190.0, ans=0.125 2024-08-13 06:11:07,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.457e+01 2.666e+01 2.949e+01 5.330e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-13 06:11:19,095 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 06:11:25,873 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14250, loss[loss=0.07573, beats_loss=0.01019, ecapa_loss=0.0002335, whisper_loss=0.06321, over 14643.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001633, whisper_loss=0.09195, over 3864000.81 frames. ], batch size: 62, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:11:44,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2026590.0, ans=0.125 2024-08-13 06:11:45,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2026590.0, ans=0.025 2024-08-13 06:11:59,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.87 vs. limit=6.0 2024-08-13 06:12:00,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2026690.0, ans=0.025 2024-08-13 06:12:16,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2026790.0, ans=0.1 2024-08-13 06:12:26,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-13 06:12:34,328 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14300, loss[loss=0.1098, beats_loss=0.01117, ecapa_loss=0.0001907, whisper_loss=0.09673, over 22087.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0109, ecapa_loss=0.0001643, whisper_loss=0.09175, over 3904148.43 frames. ], batch size: 90, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:12:39,880 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 06:13:13,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=15.0 2024-08-13 06:13:18,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2027290.0, ans=0.0 2024-08-13 06:13:24,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.493e+01 2.791e+01 3.138e+01 4.573e+01, threshold=5.581e+01, percent-clipped=0.0 2024-08-13 06:13:33,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-13 06:13:41,853 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14350, loss[loss=0.1001, beats_loss=0.01107, ecapa_loss=0.0001214, whisper_loss=0.08778, over 19721.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0108, ecapa_loss=0.0001642, whisper_loss=0.09255, over 3903894.09 frames. ], batch size: 72, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:13:44,719 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 06:14:19,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2027690.0, ans=0.035 2024-08-13 06:14:24,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2027790.0, ans=0.125 2024-08-13 06:14:24,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-13 06:14:25,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2027790.0, ans=0.125 2024-08-13 06:14:32,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2027790.0, ans=0.125 2024-08-13 06:14:36,778 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 06:14:38,300 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 06:14:39,965 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 06:14:42,529 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 06:14:45,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2027890.0, ans=0.0 2024-08-13 06:14:52,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14400, loss[loss=0.1174, beats_loss=0.01083, ecapa_loss=0.0001823, whisper_loss=0.1047, over 16866.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01079, ecapa_loss=0.0001647, whisper_loss=0.0931, over 3908999.68 frames. ], batch size: 68, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:14:53,950 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 06:15:09,012 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 20 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-13 06:15:34,202 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-13 06:15:46,238 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.480e+01 2.712e+01 3.054e+01 1.079e+02, threshold=5.424e+01, percent-clipped=2.0 2024-08-13 06:15:54,402 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 06:15:56,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2028390.0, ans=0.0 2024-08-13 06:16:06,681 INFO [train_multi_KD3.py:1116] (0/4) Epoch 14, batch 14450, loss[loss=0.112, beats_loss=0.009322, ecapa_loss=0.0001813, whisper_loss=0.1009, over 22029.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01073, ecapa_loss=0.0001661, whisper_loss=0.09353, over 3930861.59 frames. ], batch size: 88, lr: 4.41e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:16:16,133 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 06:17:14,412 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-14.pt 2024-08-13 06:17:52,540 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 0, loss[loss=0.1259, beats_loss=0.00776, ecapa_loss=0.0001704, whisper_loss=0.1164, over 19440.00 frames. ], tot_loss[loss=0.1259, beats_loss=0.00776, ecapa_loss=0.0001704, whisper_loss=0.1164, over 19440.00 frames. ], batch size: 74, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:17:52,542 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 06:18:35,151 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005623, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 06:18:52,053 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on SV_voxceleb1: loss=0.004582, beats_loss=0, ecapa_loss=0.0004582, whisper_loss=0, over 939242.00 frames. 2024-08-13 06:20:54,423 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on AT_audioset: loss=0.02384, beats_loss=0.02384, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 06:20:54,427 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 06:21:04,326 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-13 06:21:04,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2028930.0, ans=0.04949747468305833 2024-08-13 06:21:16,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2028930.0, ans=0.2 2024-08-13 06:21:19,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-13 06:21:44,402 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 24 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-13 06:21:52,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2029130.0, ans=0.125 2024-08-13 06:22:37,207 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 06:22:44,092 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 06:22:48,919 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.538e+01 2.901e+01 3.195e+01 5.923e+01, threshold=5.802e+01, percent-clipped=1.0 2024-08-13 06:22:51,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2029330.0, ans=0.2 2024-08-13 06:23:00,777 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 06:23:02,951 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 06:23:05,532 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 50, loss[loss=0.09678, beats_loss=0.009779, ecapa_loss=0.000147, whisper_loss=0.08553, over 22628.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01002, ecapa_loss=0.0001715, whisper_loss=0.0906, over 863916.25 frames. ], batch size: 87, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:23:30,801 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 06:23:45,653 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 06:24:05,058 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 06:24:15,530 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 06:24:23,940 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 06:24:31,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-08-13 06:24:37,739 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 06:24:43,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2024-08-13 06:25:03,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2024-08-13 06:25:04,419 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 100, loss[loss=0.1264, beats_loss=0.007603, ecapa_loss=0.0002016, whisper_loss=0.1168, over 16848.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.009824, ecapa_loss=0.000168, whisper_loss=0.09359, over 1531841.54 frames. ], batch size: 65, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:25:11,882 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 06:25:59,742 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 21 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 06:26:24,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2030230.0, ans=0.1 2024-08-13 06:26:38,870 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 06:26:42,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+01 2.792e+01 3.150e+01 3.564e+01 5.697e+01, threshold=6.299e+01, percent-clipped=0.0 2024-08-13 06:26:56,782 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 150, loss[loss=0.1111, beats_loss=0.008859, ecapa_loss=0.0001771, whisper_loss=0.1005, over 18617.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.009945, ecapa_loss=0.0001684, whisper_loss=0.09289, over 2025380.94 frames. ], batch size: 71, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:26:57,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2030430.0, ans=0.1 2024-08-13 06:27:03,782 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 06:27:11,792 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 06:27:15,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.78 vs. limit=22.5 2024-08-13 06:27:36,271 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 06:27:44,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2030630.0, ans=0.0 2024-08-13 06:28:07,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2030730.0, ans=0.125 2024-08-13 06:28:13,025 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 21 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-13 06:28:27,545 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 200, loss[loss=0.1309, beats_loss=0.008574, ecapa_loss=0.0001871, whisper_loss=0.1205, over 20010.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01016, ecapa_loss=0.0001678, whisper_loss=0.09233, over 2408061.90 frames. ], batch size: 79, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:28:29,619 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 06:28:35,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2030930.0, ans=0.0 2024-08-13 06:28:45,042 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 06:28:46,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2031030.0, ans=0.0 2024-08-13 06:28:49,596 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-13 06:29:04,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2031130.0, ans=0.125 2024-08-13 06:29:13,557 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 06:29:34,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2031330.0, ans=0.0 2024-08-13 06:29:38,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2031330.0, ans=0.0 2024-08-13 06:29:39,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.436e+01 2.755e+01 3.099e+01 4.760e+01, threshold=5.509e+01, percent-clipped=0.0 2024-08-13 06:29:51,976 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 250, loss[loss=0.0672, beats_loss=0.01208, ecapa_loss=0.000161, whisper_loss=0.05351, over 17844.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01027, ecapa_loss=0.0001666, whisper_loss=0.09167, over 2740566.30 frames. ], batch size: 74, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:30:04,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-08-13 06:30:19,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2031530.0, ans=0.0 2024-08-13 06:30:26,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2031630.0, ans=0.05 2024-08-13 06:30:30,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2031630.0, ans=0.1 2024-08-13 06:30:37,295 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 06:30:54,377 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 06:31:01,035 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-13 06:31:08,706 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 06:31:12,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2031930.0, ans=0.125 2024-08-13 06:31:13,737 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 300, loss[loss=0.08658, beats_loss=0.01275, ecapa_loss=0.0001316, whisper_loss=0.07251, over 22219.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001667, whisper_loss=0.09016, over 2954959.93 frames. ], batch size: 85, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:31:23,242 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 06:31:48,632 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06791721284389496, model_norm_threshold=55.09401321411133 2024-08-13 06:31:48,836 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.98, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.429e+05, grad_sumsq=7.164e+04, orig_rms_sq=8.974e+00 2024-08-13 06:32:07,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2032230.0, ans=0.125 2024-08-13 06:32:25,766 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.443e+01 2.713e+01 2.990e+01 8.112e+02, threshold=5.427e+01, percent-clipped=1.0 2024-08-13 06:32:29,594 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 06:32:37,011 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 350, loss[loss=0.09283, beats_loss=0.01114, ecapa_loss=0.0001644, whisper_loss=0.08005, over 18921.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001657, whisper_loss=0.09033, over 3145165.83 frames. ], batch size: 76, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:32:42,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-13 06:32:44,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-08-13 06:32:46,411 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 06:32:57,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2032530.0, ans=0.2 2024-08-13 06:33:13,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2032630.0, ans=0.125 2024-08-13 06:33:19,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2032630.0, ans=0.1 2024-08-13 06:33:26,586 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 06:33:35,479 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-13 06:33:37,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2032730.0, ans=0.2 2024-08-13 06:33:44,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2032830.0, ans=0.125 2024-08-13 06:33:44,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2032830.0, ans=0.125 2024-08-13 06:33:49,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-08-13 06:33:55,210 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 06:33:57,757 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 400, loss[loss=0.1074, beats_loss=0.01058, ecapa_loss=0.000168, whisper_loss=0.09509, over 15979.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01062, ecapa_loss=0.0001655, whisper_loss=0.08966, over 3286728.11 frames. ], batch size: 64, lr: 4.26e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:33:59,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2032930.0, ans=0.0 2024-08-13 06:34:02,080 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 06:34:43,856 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-13 06:34:56,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2033230.0, ans=0.125 2024-08-13 06:35:06,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.548e+01 2.826e+01 3.113e+01 9.410e+01, threshold=5.653e+01, percent-clipped=3.0 2024-08-13 06:35:15,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2033330.0, ans=0.125 2024-08-13 06:35:18,002 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 450, loss[loss=0.09788, beats_loss=0.01098, ecapa_loss=0.0001629, whisper_loss=0.08528, over 15176.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001643, whisper_loss=0.09066, over 3400454.48 frames. ], batch size: 58, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:35:18,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2033430.0, ans=0.0 2024-08-13 06:35:22,662 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 19 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-13 06:36:12,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-13 06:36:37,344 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 500, loss[loss=0.09236, beats_loss=0.01114, ecapa_loss=0.0001444, whisper_loss=0.07977, over 14861.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001649, whisper_loss=0.09079, over 3516554.99 frames. ], batch size: 55, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:36:43,420 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 06:37:02,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.27 vs. limit=22.5 2024-08-13 06:37:26,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2034230.0, ans=0.0 2024-08-13 06:37:34,130 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-13 06:37:37,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2034230.0, ans=0.125 2024-08-13 06:37:45,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.386e+01 2.704e+01 2.981e+01 6.756e+01, threshold=5.408e+01, percent-clipped=1.0 2024-08-13 06:37:45,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2034330.0, ans=0.0 2024-08-13 06:37:48,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2034330.0, ans=0.0 2024-08-13 06:37:56,427 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 550, loss[loss=0.1064, beats_loss=0.0112, ecapa_loss=0.0001708, whisper_loss=0.09348, over 18449.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001655, whisper_loss=0.09046, over 3595097.67 frames. ], batch size: 73, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:38:09,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2034430.0, ans=0.0 2024-08-13 06:38:51,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2034730.0, ans=0.125 2024-08-13 06:39:08,836 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 06:39:17,194 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 600, loss[loss=0.1101, beats_loss=0.01107, ecapa_loss=0.0001753, whisper_loss=0.09727, over 14140.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01057, ecapa_loss=0.0001642, whisper_loss=0.09172, over 3647021.35 frames. ], batch size: 56, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:39:19,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2034930.0, ans=0.0 2024-08-13 06:39:25,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2034930.0, ans=0.1 2024-08-13 06:39:34,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2035030.0, ans=0.125 2024-08-13 06:39:42,607 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 06:40:14,999 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.587e-02 2024-08-13 06:40:15,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2035230.0, ans=0.1 2024-08-13 06:40:16,515 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 06:40:25,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.434e+01 2.721e+01 3.072e+01 6.546e+01, threshold=5.441e+01, percent-clipped=1.0 2024-08-13 06:40:37,804 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 650, loss[loss=0.09671, beats_loss=0.01273, ecapa_loss=0.0001347, whisper_loss=0.08263, over 21987.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01062, ecapa_loss=0.0001647, whisper_loss=0.09125, over 3668975.66 frames. ], batch size: 89, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:40:55,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2035530.0, ans=0.125 2024-08-13 06:40:57,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2035530.0, ans=0.0 2024-08-13 06:40:59,005 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.064e-01 2024-08-13 06:41:20,351 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 28 from Vox, 22 fro AS 2024-08-13 06:41:21,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2035630.0, ans=0.2 2024-08-13 06:41:41,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2035830.0, ans=0.125 2024-08-13 06:41:42,937 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 06:41:52,587 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-13 06:41:54,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2035830.0, ans=0.0 2024-08-13 06:41:59,059 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 700, loss[loss=0.1006, beats_loss=0.009586, ecapa_loss=0.0001885, whisper_loss=0.08908, over 20409.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01064, ecapa_loss=0.0001652, whisper_loss=0.09084, over 3715648.49 frames. ], batch size: 83, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:42:19,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2036030.0, ans=0.0 2024-08-13 06:42:25,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2024-08-13 06:42:28,734 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 06:42:49,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2036230.0, ans=0.125 2024-08-13 06:43:03,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2024-08-13 06:43:08,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.352e+01 2.612e+01 3.001e+01 5.116e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-13 06:43:14,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2036330.0, ans=0.125 2024-08-13 06:43:19,763 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 750, loss[loss=0.1071, beats_loss=0.00882, ecapa_loss=0.000159, whisper_loss=0.09669, over 15463.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001652, whisper_loss=0.09039, over 3706683.26 frames. ], batch size: 60, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:43:21,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2036430.0, ans=0.125 2024-08-13 06:43:26,671 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 06:43:31,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2036430.0, ans=0.1 2024-08-13 06:43:36,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2036530.0, ans=0.125 2024-08-13 06:43:40,714 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 17 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-13 06:43:43,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2036530.0, ans=0.0 2024-08-13 06:43:51,818 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 06:43:59,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2036630.0, ans=0.0 2024-08-13 06:44:01,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2036630.0, ans=0.04949747468305833 2024-08-13 06:44:23,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2036830.0, ans=0.125 2024-08-13 06:44:24,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2036830.0, ans=0.07 2024-08-13 06:44:32,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2036830.0, ans=0.0 2024-08-13 06:44:37,080 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 800, loss[loss=0.1159, beats_loss=0.009959, ecapa_loss=0.000141, whisper_loss=0.1045, over 17258.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01074, ecapa_loss=0.000165, whisper_loss=0.09017, over 3726590.08 frames. ], batch size: 65, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:44:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2037030.0, ans=0.0 2024-08-13 06:44:59,300 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 06:45:02,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2037030.0, ans=0.2 2024-08-13 06:45:08,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2037130.0, ans=0.125 2024-08-13 06:45:30,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2037230.0, ans=10.0 2024-08-13 06:45:34,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2037230.0, ans=0.0 2024-08-13 06:45:43,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.386e+01 2.631e+01 2.954e+01 1.989e+02, threshold=5.262e+01, percent-clipped=2.0 2024-08-13 06:45:44,857 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-13 06:45:51,036 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 06:45:53,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 850, loss[loss=0.1134, beats_loss=0.009301, ecapa_loss=0.0001539, whisper_loss=0.1026, over 20964.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01069, ecapa_loss=0.000165, whisper_loss=0.09042, over 3755771.54 frames. ], batch size: 81, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:46:13,657 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-13 06:46:15,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2037530.0, ans=0.2 2024-08-13 06:46:20,458 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 06:46:34,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2037630.0, ans=0.125 2024-08-13 06:46:40,142 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 06:46:41,603 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 06:46:52,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=22.5 2024-08-13 06:46:56,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2037830.0, ans=0.1 2024-08-13 06:47:10,185 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 900, loss[loss=0.1079, beats_loss=0.009987, ecapa_loss=0.0001706, whisper_loss=0.0962, over 21154.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01062, ecapa_loss=0.0001636, whisper_loss=0.0909, over 3788474.94 frames. ], batch size: 83, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:47:38,063 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 15 from Vox, 52 fro AS 2024-08-13 06:47:50,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2038130.0, ans=0.0 2024-08-13 06:48:02,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2038230.0, ans=0.125 2024-08-13 06:48:12,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+01 2.393e+01 2.649e+01 3.126e+01 8.192e+01, threshold=5.298e+01, percent-clipped=1.0 2024-08-13 06:48:15,171 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 06:48:19,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-13 06:48:22,627 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 950, loss[loss=0.1108, beats_loss=0.007981, ecapa_loss=0.0001977, whisper_loss=0.1008, over 17992.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001636, whisper_loss=0.09104, over 3807604.11 frames. ], batch size: 70, lr: 4.25e-03, grad_scale: 1.152921504606847e+18 2024-08-13 06:48:28,480 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-13 06:48:49,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2038530.0, ans=0.025 2024-08-13 06:49:12,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2038730.0, ans=0.125 2024-08-13 06:49:23,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2038730.0, ans=0.0 2024-08-13 06:49:32,328 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 06:49:34,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2038830.0, ans=0.125 2024-08-13 06:49:43,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2038930.0, ans=0.125 2024-08-13 06:49:44,438 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1000, loss[loss=0.1178, beats_loss=0.00983, ecapa_loss=0.0001804, whisper_loss=0.1062, over 18653.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0107, ecapa_loss=0.0001619, whisper_loss=0.09033, over 3825313.87 frames. ], batch size: 74, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:50:07,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2039030.0, ans=0.125 2024-08-13 06:50:08,787 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 06:50:18,825 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 06:50:35,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-13 06:50:51,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2039330.0, ans=0.0 2024-08-13 06:50:55,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.424e+01 2.734e+01 3.160e+01 9.771e+01, threshold=5.467e+01, percent-clipped=3.0 2024-08-13 06:50:58,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-13 06:50:59,292 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 06:51:03,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2024-08-13 06:51:05,539 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1050, loss[loss=0.1027, beats_loss=0.01161, ecapa_loss=0.000115, whisper_loss=0.08993, over 18571.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01075, ecapa_loss=0.0001608, whisper_loss=0.08999, over 3815016.17 frames. ], batch size: 70, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:51:07,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2039430.0, ans=0.1 2024-08-13 06:51:13,914 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 30 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 06:51:23,423 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-13 06:51:25,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2039530.0, ans=0.1 2024-08-13 06:51:27,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2039530.0, ans=0.125 2024-08-13 06:51:32,245 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 06:51:35,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2039630.0, ans=0.0 2024-08-13 06:51:56,752 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 06:52:08,414 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-13 06:52:11,420 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 06:52:18,160 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-13 06:52:20,555 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1100, loss[loss=0.09888, beats_loss=0.00973, ecapa_loss=0.0001548, whisper_loss=0.0876, over 17434.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01066, ecapa_loss=0.000161, whisper_loss=0.09109, over 3820141.21 frames. ], batch size: 65, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:52:20,705 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-13 06:52:22,007 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-13 06:52:29,019 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-204000.pt 2024-08-13 06:52:34,963 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 06:52:57,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2040130.0, ans=0.125 2024-08-13 06:52:58,742 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-13 06:53:01,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2024-08-13 06:53:10,279 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 06:53:13,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2040230.0, ans=0.125 2024-08-13 06:53:24,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.366e+01 2.661e+01 3.055e+01 5.230e+01, threshold=5.322e+01, percent-clipped=0.0 2024-08-13 06:53:32,926 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1150, loss[loss=0.1427, beats_loss=0.006654, ecapa_loss=0.0001607, whisper_loss=0.1344, over 16481.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01068, ecapa_loss=0.0001602, whisper_loss=0.0919, over 3806019.04 frames. ], batch size: 61, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:53:39,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2040430.0, ans=0.125 2024-08-13 06:53:54,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2040530.0, ans=0.125 2024-08-13 06:53:54,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-13 06:54:05,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2040630.0, ans=0.0 2024-08-13 06:54:05,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2040630.0, ans=0.125 2024-08-13 06:54:13,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2040630.0, ans=0.0 2024-08-13 06:54:17,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2040730.0, ans=0.125 2024-08-13 06:54:22,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2040730.0, ans=0.05 2024-08-13 06:54:38,967 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-13 06:54:44,982 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1200, loss[loss=0.08728, beats_loss=0.01147, ecapa_loss=0.0001366, whisper_loss=0.07444, over 19804.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01076, ecapa_loss=0.0001608, whisper_loss=0.09125, over 3815756.63 frames. ], batch size: 74, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:54:51,873 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 06:55:00,408 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 06:55:07,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2041030.0, ans=0.2 2024-08-13 06:55:12,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2041130.0, ans=0.1 2024-08-13 06:55:33,913 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 06:55:45,528 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 06:55:46,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.369e+01 2.676e+01 3.078e+01 7.518e+01, threshold=5.351e+01, percent-clipped=1.0 2024-08-13 06:55:54,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1250, loss[loss=0.08259, beats_loss=0.0133, ecapa_loss=0.0001855, whisper_loss=0.06744, over 13947.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001597, whisper_loss=0.09078, over 3848599.05 frames. ], batch size: 56, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:56:05,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2041430.0, ans=0.125 2024-08-13 06:56:08,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2041530.0, ans=0.125 2024-08-13 06:56:24,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2041630.0, ans=0.125 2024-08-13 06:56:29,302 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 06:56:32,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-13 06:56:35,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2041730.0, ans=0.0 2024-08-13 06:56:36,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2041730.0, ans=0.09899494936611666 2024-08-13 06:56:45,504 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 06:56:53,571 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 06:56:55,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2041830.0, ans=0.2 2024-08-13 06:57:00,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-08-13 06:57:01,250 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1300, loss[loss=0.1016, beats_loss=0.01177, ecapa_loss=0.0001578, whisper_loss=0.08828, over 22298.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01081, ecapa_loss=0.0001602, whisper_loss=0.09046, over 3846069.13 frames. ], batch size: 89, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:57:01,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2041930.0, ans=0.125 2024-08-13 06:57:55,359 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 15 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 06:57:59,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.323e+01 2.630e+01 3.145e+01 6.794e+01, threshold=5.259e+01, percent-clipped=2.0 2024-08-13 06:58:07,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1350, loss[loss=0.1039, beats_loss=0.01104, ecapa_loss=0.0001453, whisper_loss=0.09142, over 21515.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01092, ecapa_loss=0.0001602, whisper_loss=0.08953, over 3838317.88 frames. ], batch size: 84, lr: 4.25e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:58:11,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2042430.0, ans=0.125 2024-08-13 06:58:11,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2042430.0, ans=0.125 2024-08-13 06:58:13,730 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 06:58:31,974 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 19 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 06:58:50,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2042730.0, ans=0.1 2024-08-13 06:58:52,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2042730.0, ans=0.0 2024-08-13 06:58:57,235 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 06:58:58,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2042830.0, ans=0.0 2024-08-13 06:59:04,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2042830.0, ans=0.0 2024-08-13 06:59:13,153 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1400, loss[loss=0.106, beats_loss=0.011, ecapa_loss=0.0001602, whisper_loss=0.09344, over 22629.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01082, ecapa_loss=0.0001596, whisper_loss=0.08982, over 3836440.19 frames. ], batch size: 89, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 06:59:18,671 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 06:59:23,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2024-08-13 06:59:26,685 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 06:59:32,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2043030.0, ans=0.0 2024-08-13 06:59:34,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2043030.0, ans=0.1 2024-08-13 06:59:47,226 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 06:59:50,189 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-13 06:59:55,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2043230.0, ans=0.1 2024-08-13 07:00:02,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2043230.0, ans=10.0 2024-08-13 07:00:08,914 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 07:00:12,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.355e+01 2.665e+01 2.989e+01 4.736e+01, threshold=5.330e+01, percent-clipped=0.0 2024-08-13 07:00:20,717 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1450, loss[loss=0.07685, beats_loss=0.0137, ecapa_loss=0.0001715, whisper_loss=0.06144, over 21699.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01077, ecapa_loss=0.0001603, whisper_loss=0.0893, over 3779343.80 frames. ], batch size: 94, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:00:47,104 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 23 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-13 07:00:51,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2043430.0, ans=0.07 2024-08-13 07:01:00,992 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.818e-01 2024-08-13 07:01:06,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2043530.0, ans=0.04949747468305833 2024-08-13 07:01:14,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2043630.0, ans=15.0 2024-08-13 07:01:21,548 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 07:01:30,564 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:01:40,885 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-13 07:01:51,732 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1500, loss[loss=0.09991, beats_loss=0.01208, ecapa_loss=0.0001457, whisper_loss=0.08637, over 20645.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01081, ecapa_loss=0.0001589, whisper_loss=0.08928, over 3806336.95 frames. ], batch size: 83, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:01:52,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2043930.0, ans=0.0 2024-08-13 07:01:53,400 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:01:56,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-13 07:02:00,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2043930.0, ans=0.125 2024-08-13 07:02:06,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-13 07:02:12,203 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 07:02:13,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2044030.0, ans=0.125 2024-08-13 07:02:17,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-13 07:02:20,087 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 07:02:25,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2044130.0, ans=0.125 2024-08-13 07:02:35,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-13 07:02:46,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2044330.0, ans=0.1 2024-08-13 07:02:51,410 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.422e+01 2.612e+01 2.997e+01 7.275e+01, threshold=5.223e+01, percent-clipped=1.0 2024-08-13 07:02:59,620 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1550, loss[loss=0.1243, beats_loss=0.009757, ecapa_loss=0.0001837, whisper_loss=0.1127, over 18550.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01082, ecapa_loss=0.000159, whisper_loss=0.08991, over 3810450.81 frames. ], batch size: 75, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:03:01,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2044430.0, ans=0.0 2024-08-13 07:03:18,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2044530.0, ans=0.0 2024-08-13 07:03:32,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2044630.0, ans=0.1 2024-08-13 07:03:34,781 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 07:03:56,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2044830.0, ans=0.1 2024-08-13 07:04:09,478 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1600, loss[loss=0.1059, beats_loss=0.007405, ecapa_loss=0.0001709, whisper_loss=0.09679, over 14530.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001586, whisper_loss=0.09054, over 3825902.53 frames. ], batch size: 58, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:04:16,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2044930.0, ans=0.035 2024-08-13 07:04:43,399 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 07:04:46,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2045130.0, ans=0.1 2024-08-13 07:05:10,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.418e+01 2.670e+01 2.986e+01 1.271e+02, threshold=5.339e+01, percent-clipped=4.0 2024-08-13 07:05:19,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2024-08-13 07:05:20,037 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1650, loss[loss=0.07036, beats_loss=0.01395, ecapa_loss=0.0002097, whisper_loss=0.05431, over 16830.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001617, whisper_loss=0.0905, over 3823888.89 frames. ], batch size: 74, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:05:22,795 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 31 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-13 07:05:25,284 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 07:05:30,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-08-13 07:05:38,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2045530.0, ans=0.125 2024-08-13 07:05:48,895 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 31 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 07:05:49,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2045630.0, ans=0.125 2024-08-13 07:06:15,942 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 07:06:29,145 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1700, loss[loss=0.07431, beats_loss=0.01261, ecapa_loss=0.0001789, whisper_loss=0.05991, over 18885.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0107, ecapa_loss=0.0001606, whisper_loss=0.09078, over 3821738.39 frames. ], batch size: 82, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:06:42,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2046030.0, ans=0.1 2024-08-13 07:06:49,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2046030.0, ans=0.125 2024-08-13 07:06:56,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2046130.0, ans=0.0 2024-08-13 07:07:05,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-13 07:07:15,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2046230.0, ans=0.0 2024-08-13 07:07:23,523 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-13 07:07:23,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-13 07:07:27,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2046330.0, ans=0.125 2024-08-13 07:07:31,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.428e+01 2.659e+01 3.089e+01 1.627e+02, threshold=5.319e+01, percent-clipped=1.0 2024-08-13 07:07:39,818 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1750, loss[loss=0.09305, beats_loss=0.01097, ecapa_loss=0.0001638, whisper_loss=0.08044, over 15519.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01079, ecapa_loss=0.0001593, whisper_loss=0.09045, over 3847501.51 frames. ], batch size: 60, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:07:59,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2046530.0, ans=0.125 2024-08-13 07:08:01,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2046530.0, ans=0.125 2024-08-13 07:08:10,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2046630.0, ans=0.125 2024-08-13 07:08:22,026 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-13 07:08:25,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2046730.0, ans=0.125 2024-08-13 07:08:45,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2046830.0, ans=10.0 2024-08-13 07:08:46,911 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 07:08:49,385 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1800, loss[loss=0.0901, beats_loss=0.009952, ecapa_loss=0.0001587, whisper_loss=0.07856, over 17333.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001599, whisper_loss=0.09052, over 3846351.15 frames. ], batch size: 68, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:08:59,842 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 07:09:05,201 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 07:09:15,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2047030.0, ans=0.1 2024-08-13 07:09:20,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2047130.0, ans=0.125 2024-08-13 07:09:30,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2047230.0, ans=0.0 2024-08-13 07:09:40,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2047230.0, ans=0.2 2024-08-13 07:09:40,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2047230.0, ans=0.2 2024-08-13 07:09:51,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.458e+01 2.695e+01 3.131e+01 5.479e+01, threshold=5.391e+01, percent-clipped=1.0 2024-08-13 07:09:59,905 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1850, loss[loss=0.1017, beats_loss=0.009672, ecapa_loss=0.0001518, whisper_loss=0.09049, over 20308.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0107, ecapa_loss=0.0001587, whisper_loss=0.09035, over 3822782.83 frames. ], batch size: 78, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:10:11,022 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 07:10:27,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2047630.0, ans=0.0 2024-08-13 07:10:29,911 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 07:10:31,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2047630.0, ans=0.2 2024-08-13 07:10:40,661 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 07:10:48,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2047730.0, ans=0.0 2024-08-13 07:10:57,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2047830.0, ans=0.125 2024-08-13 07:11:06,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2024-08-13 07:11:08,072 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1900, loss[loss=0.09019, beats_loss=0.00992, ecapa_loss=0.0001407, whisper_loss=0.07886, over 16812.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01075, ecapa_loss=0.0001592, whisper_loss=0.08923, over 3824210.16 frames. ], batch size: 66, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:11:23,453 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 07:11:35,886 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 07:11:36,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-08-13 07:11:49,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2024-08-13 07:11:57,625 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 12 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 07:12:03,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2048330.0, ans=0.125 2024-08-13 07:12:03,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2048330.0, ans=0.0 2024-08-13 07:12:09,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.366e+01 2.636e+01 3.036e+01 8.197e+01, threshold=5.272e+01, percent-clipped=3.0 2024-08-13 07:12:17,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 1950, loss[loss=0.1164, beats_loss=0.00878, ecapa_loss=0.0001565, whisper_loss=0.1061, over 20401.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001613, whisper_loss=0.08998, over 3826052.44 frames. ], batch size: 77, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:12:25,039 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 07:12:47,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2048630.0, ans=0.125 2024-08-13 07:12:49,191 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-13 07:12:53,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2048630.0, ans=0.125 2024-08-13 07:13:07,146 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 07:13:10,456 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-13 07:13:15,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2048730.0, ans=0.125 2024-08-13 07:13:18,182 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 07:13:18,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2048830.0, ans=0.125 2024-08-13 07:13:26,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=12.0 2024-08-13 07:13:31,131 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 07:13:33,747 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2000, loss[loss=0.09734, beats_loss=0.007536, ecapa_loss=0.0001606, whisper_loss=0.0882, over 17770.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0107, ecapa_loss=0.0001605, whisper_loss=0.09009, over 3830214.44 frames. ], batch size: 66, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:13:37,222 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 07:13:39,989 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 07:13:52,760 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 07:13:57,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2049030.0, ans=0.2 2024-08-13 07:13:59,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2049030.0, ans=0.125 2024-08-13 07:13:59,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2049030.0, ans=0.125 2024-08-13 07:14:09,494 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 31 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 07:14:17,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2049130.0, ans=0.0 2024-08-13 07:14:25,370 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 31 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 07:14:36,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2024-08-13 07:14:37,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2024-08-13 07:14:42,728 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.431e+01 2.683e+01 2.951e+01 6.273e+01, threshold=5.366e+01, percent-clipped=2.0 2024-08-13 07:14:44,105 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 07:14:46,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2049330.0, ans=0.125 2024-08-13 07:14:51,647 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2050, loss[loss=0.09686, beats_loss=0.01076, ecapa_loss=0.000189, whisper_loss=0.08422, over 20997.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01069, ecapa_loss=0.0001611, whisper_loss=0.09075, over 3840046.98 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:14:52,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=12.0 2024-08-13 07:14:58,454 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-13 07:15:19,854 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.607e-02 2024-08-13 07:15:26,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2049630.0, ans=0.125 2024-08-13 07:15:29,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2049630.0, ans=0.0 2024-08-13 07:15:33,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2049630.0, ans=0.125 2024-08-13 07:16:08,705 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2100, loss[loss=0.09376, beats_loss=0.01183, ecapa_loss=0.0001653, whisper_loss=0.08028, over 22866.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01076, ecapa_loss=0.0001601, whisper_loss=0.09019, over 3813578.06 frames. ], batch size: 94, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:16:10,319 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 07:16:15,927 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 07:16:39,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2050130.0, ans=0.125 2024-08-13 07:16:48,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2050130.0, ans=0.0 2024-08-13 07:16:55,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-08-13 07:16:58,136 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 07:17:06,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2050230.0, ans=0.1 2024-08-13 07:17:15,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.374e+01 2.616e+01 2.948e+01 7.626e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-13 07:17:24,454 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2150, loss[loss=0.1256, beats_loss=0.007407, ecapa_loss=0.0001501, whisper_loss=0.1167, over 23748.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01076, ecapa_loss=0.0001604, whisper_loss=0.09046, over 3789287.06 frames. ], batch size: 87, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:17:26,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2050430.0, ans=0.0 2024-08-13 07:17:48,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2050530.0, ans=0.0 2024-08-13 07:18:37,858 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2200, loss[loss=0.1005, beats_loss=0.01208, ecapa_loss=0.0001268, whisper_loss=0.08712, over 20650.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01074, ecapa_loss=0.0001597, whisper_loss=0.09079, over 3802762.01 frames. ], batch size: 82, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:18:38,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2050930.0, ans=0.0 2024-08-13 07:18:39,561 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 07:18:45,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2050930.0, ans=0.0 2024-08-13 07:18:53,246 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 07:18:56,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2051030.0, ans=0.2 2024-08-13 07:18:56,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2051030.0, ans=0.0 2024-08-13 07:19:00,842 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-13 07:19:03,493 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 07:19:03,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2051030.0, ans=0.125 2024-08-13 07:19:06,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2051130.0, ans=0.1 2024-08-13 07:19:09,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.19 vs. limit=10.0 2024-08-13 07:19:19,658 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 07:19:21,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2051230.0, ans=0.125 2024-08-13 07:19:30,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2051230.0, ans=0.1 2024-08-13 07:19:30,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2051230.0, ans=0.125 2024-08-13 07:19:43,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.405e+01 2.692e+01 3.101e+01 3.996e+01, threshold=5.385e+01, percent-clipped=0.0 2024-08-13 07:19:52,997 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2250, loss[loss=0.1036, beats_loss=0.01119, ecapa_loss=0.0001528, whisper_loss=0.09092, over 22670.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001614, whisper_loss=0.09142, over 3827759.18 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:19:58,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-13 07:20:09,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=22.5 2024-08-13 07:20:16,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2051530.0, ans=0.0 2024-08-13 07:20:18,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2051530.0, ans=0.025 2024-08-13 07:20:33,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=22.5 2024-08-13 07:20:52,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=12.0 2024-08-13 07:21:03,791 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 07:21:04,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2051830.0, ans=0.125 2024-08-13 07:21:10,957 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2300, loss[loss=0.1149, beats_loss=0.01146, ecapa_loss=0.0001542, whisper_loss=0.1019, over 23127.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001615, whisper_loss=0.09154, over 3853409.77 frames. ], batch size: 93, lr: 4.24e-03, grad_scale: 5.764607523034235e+17 2024-08-13 07:21:36,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2024-08-13 07:21:47,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2052130.0, ans=0.125 2024-08-13 07:21:47,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2052130.0, ans=0.125 2024-08-13 07:21:59,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2052230.0, ans=10.0 2024-08-13 07:22:15,293 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 07:22:18,512 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-13 07:22:20,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.551e+01 2.802e+01 3.286e+01 4.961e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 07:22:26,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2052430.0, ans=0.125 2024-08-13 07:22:27,887 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2350, loss[loss=0.07269, beats_loss=0.01155, ecapa_loss=0.0001569, whisper_loss=0.05957, over 13609.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01081, ecapa_loss=0.0001637, whisper_loss=0.09078, over 3843248.78 frames. ], batch size: 56, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:22:36,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-13 07:22:38,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2052430.0, ans=0.0 2024-08-13 07:23:24,360 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 07:23:29,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2052830.0, ans=0.2 2024-08-13 07:23:39,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-08-13 07:23:43,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2400, loss[loss=0.1378, beats_loss=0.007681, ecapa_loss=0.0001883, whisper_loss=0.1283, over 20163.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01078, ecapa_loss=0.0001644, whisper_loss=0.09097, over 3837567.76 frames. ], batch size: 79, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:23:54,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=22.5 2024-08-13 07:24:02,709 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 23 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-13 07:24:13,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-08-13 07:24:20,044 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 07:24:28,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2053230.0, ans=0.09899494936611666 2024-08-13 07:24:35,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2053230.0, ans=0.125 2024-08-13 07:24:41,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2053230.0, ans=0.2 2024-08-13 07:24:45,688 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 07:24:49,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2053330.0, ans=0.125 2024-08-13 07:24:52,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.349e+01 2.648e+01 3.316e+01 5.305e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:24:56,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2053330.0, ans=0.0 2024-08-13 07:25:00,253 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2450, loss[loss=0.1051, beats_loss=0.009761, ecapa_loss=0.0001689, whisper_loss=0.09363, over 14814.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01072, ecapa_loss=0.0001636, whisper_loss=0.09141, over 3874404.12 frames. ], batch size: 56, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:25:00,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2053430.0, ans=0.125 2024-08-13 07:25:11,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-13 07:25:25,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-08-13 07:25:42,471 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 07:25:56,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2053730.0, ans=0.125 2024-08-13 07:25:56,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2053730.0, ans=0.125 2024-08-13 07:25:58,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2053730.0, ans=0.0 2024-08-13 07:25:59,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2053830.0, ans=0.125 2024-08-13 07:26:10,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2053830.0, ans=0.05 2024-08-13 07:26:16,237 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2500, loss[loss=0.1086, beats_loss=0.01021, ecapa_loss=0.0001469, whisper_loss=0.09688, over 23414.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0107, ecapa_loss=0.0001648, whisper_loss=0.09109, over 3872908.00 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:26:19,498 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 07:26:21,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2053930.0, ans=0.125 2024-08-13 07:26:25,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2053930.0, ans=0.2 2024-08-13 07:26:50,184 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 31 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-13 07:27:08,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2054230.0, ans=0.1 2024-08-13 07:27:13,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-13 07:27:25,266 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.432e+01 2.694e+01 2.986e+01 7.508e+01, threshold=5.387e+01, percent-clipped=1.0 2024-08-13 07:27:27,563 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 07:27:33,206 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2550, loss[loss=0.102, beats_loss=0.01189, ecapa_loss=0.0001737, whisper_loss=0.08836, over 22819.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01064, ecapa_loss=0.0001649, whisper_loss=0.09198, over 3887098.88 frames. ], batch size: 94, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:27:36,676 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.717e+05 2024-08-13 07:27:39,428 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-13 07:27:50,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2054530.0, ans=0.0 2024-08-13 07:27:56,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2054530.0, ans=0.125 2024-08-13 07:28:02,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-08-13 07:28:04,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2054630.0, ans=0.2 2024-08-13 07:28:23,650 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-13 07:28:30,908 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 07:28:47,198 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2600, loss[loss=0.1056, beats_loss=0.008999, ecapa_loss=0.0001563, whisper_loss=0.09502, over 19366.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01063, ecapa_loss=0.0001653, whisper_loss=0.09215, over 3879656.02 frames. ], batch size: 73, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:29:34,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2055230.0, ans=0.0 2024-08-13 07:29:56,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.419e+01 2.702e+01 3.112e+01 4.104e+01, threshold=5.404e+01, percent-clipped=0.0 2024-08-13 07:30:03,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-08-13 07:30:03,669 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2650, loss[loss=0.08091, beats_loss=0.01245, ecapa_loss=0.0001775, whisper_loss=0.06669, over 17981.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01067, ecapa_loss=0.0001651, whisper_loss=0.09179, over 3859723.78 frames. ], batch size: 76, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:30:07,898 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 07:30:09,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-08-13 07:30:11,947 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 07:31:00,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-13 07:31:01,122 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-13 07:31:20,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2700, loss[loss=0.1127, beats_loss=0.008458, ecapa_loss=0.0002021, whisper_loss=0.1022, over 15765.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01073, ecapa_loss=0.0001649, whisper_loss=0.09174, over 3889417.47 frames. ], batch size: 62, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:31:20,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2055930.0, ans=0.125 2024-08-13 07:31:20,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-13 07:31:21,573 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 37 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-13 07:31:43,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2024-08-13 07:32:03,843 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 07:32:10,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2056230.0, ans=0.0 2024-08-13 07:32:13,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2056230.0, ans=0.2 2024-08-13 07:32:25,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2056330.0, ans=0.0 2024-08-13 07:32:28,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.002e+01 2.371e+01 2.713e+01 3.227e+01 1.003e+02, threshold=5.426e+01, percent-clipped=2.0 2024-08-13 07:32:36,808 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2750, loss[loss=0.1107, beats_loss=0.01003, ecapa_loss=0.0001575, whisper_loss=0.09907, over 19401.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001659, whisper_loss=0.09141, over 3865227.08 frames. ], batch size: 74, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:32:48,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2056430.0, ans=0.125 2024-08-13 07:33:06,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2056630.0, ans=0.125 2024-08-13 07:33:29,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2056730.0, ans=0.125 2024-08-13 07:33:45,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2056830.0, ans=0.1 2024-08-13 07:33:47,721 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 07:33:55,693 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2800, loss[loss=0.08656, beats_loss=0.0084, ecapa_loss=0.0002227, whisper_loss=0.07593, over 12516.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01077, ecapa_loss=0.0001656, whisper_loss=0.09154, over 3858942.24 frames. ], batch size: 54, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:33:55,868 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 22 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-13 07:33:57,133 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 07:34:02,132 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 07:34:15,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-13 07:34:33,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2057130.0, ans=0.5 2024-08-13 07:34:41,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2057130.0, ans=15.0 2024-08-13 07:34:45,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2057230.0, ans=0.125 2024-08-13 07:34:49,134 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 07:34:51,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.76 vs. limit=10.0 2024-08-13 07:34:58,592 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 07:35:08,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.448e+01 2.685e+01 2.951e+01 5.516e+01, threshold=5.370e+01, percent-clipped=1.0 2024-08-13 07:35:12,977 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 07:35:15,867 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2850, loss[loss=0.0804, beats_loss=0.01279, ecapa_loss=0.0001659, whisper_loss=0.06595, over 15320.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.000165, whisper_loss=0.09104, over 3848637.49 frames. ], batch size: 61, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:35:51,179 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-13 07:36:01,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2057630.0, ans=0.125 2024-08-13 07:36:04,154 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 07:36:14,657 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-13 07:36:27,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2057830.0, ans=0.125 2024-08-13 07:36:30,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2057830.0, ans=0.1 2024-08-13 07:36:36,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2024-08-13 07:36:38,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2900, loss[loss=0.08755, beats_loss=0.01118, ecapa_loss=0.0001631, whisper_loss=0.07474, over 16581.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001647, whisper_loss=0.0912, over 3862593.45 frames. ], batch size: 66, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:36:38,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2057930.0, ans=0.1 2024-08-13 07:36:48,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2057930.0, ans=0.04949747468305833 2024-08-13 07:37:09,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2058130.0, ans=0.0 2024-08-13 07:37:15,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2058130.0, ans=0.125 2024-08-13 07:37:27,345 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 07:37:49,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.431e+01 2.692e+01 3.123e+01 5.434e+01, threshold=5.383e+01, percent-clipped=1.0 2024-08-13 07:37:58,365 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 2950, loss[loss=0.09802, beats_loss=0.01259, ecapa_loss=0.0001989, whisper_loss=0.08344, over 20722.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001651, whisper_loss=0.09141, over 3874223.06 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:38:12,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2058430.0, ans=0.1 2024-08-13 07:38:18,630 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 07:38:22,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2058530.0, ans=0.125 2024-08-13 07:38:29,240 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 07:38:34,804 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 07:38:36,531 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 07:38:57,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2058730.0, ans=0.2 2024-08-13 07:39:01,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2058730.0, ans=0.125 2024-08-13 07:39:09,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2058830.0, ans=0.2 2024-08-13 07:39:23,161 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3000, loss[loss=0.09019, beats_loss=0.0115, ecapa_loss=0.0001709, whisper_loss=0.07698, over 22317.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01081, ecapa_loss=0.000166, whisper_loss=0.09184, over 3858823.09 frames. ], batch size: 93, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:39:23,162 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 07:40:01,981 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on ASR_libri: loss=0.2552, beats_loss=0, ecapa_loss=0.0005768, whisper_loss=0.2494, over 922467.00 frames. 2024-08-13 07:40:19,509 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on SV_voxceleb1: loss=0.00457, beats_loss=0, ecapa_loss=0.000457, whisper_loss=0, over 939242.00 frames. 2024-08-13 07:40:46,807 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8891, 2.7317, 2.5614, 2.5162], device='cuda:0') 2024-08-13 07:42:09,733 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 07:42:09,737 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 07:42:18,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2058930.0, ans=0.2 2024-08-13 07:42:40,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2059030.0, ans=0.0 2024-08-13 07:42:54,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2059130.0, ans=0.0 2024-08-13 07:43:00,884 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 07:43:15,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2059230.0, ans=0.05 2024-08-13 07:43:20,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2059330.0, ans=0.0 2024-08-13 07:43:30,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.520e+01 2.888e+01 3.342e+01 5.667e+01, threshold=5.776e+01, percent-clipped=1.0 2024-08-13 07:43:38,315 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3050, loss[loss=0.07377, beats_loss=0.01081, ecapa_loss=0.0002124, whisper_loss=0.06084, over 14262.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.0001659, whisper_loss=0.09185, over 3869568.64 frames. ], batch size: 60, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:43:51,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=15.0 2024-08-13 07:43:59,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2059530.0, ans=0.125 2024-08-13 07:44:03,788 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 07:44:04,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-08-13 07:44:06,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=12.0 2024-08-13 07:44:07,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-08-13 07:44:58,166 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 07:45:00,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2059830.0, ans=0.125 2024-08-13 07:45:04,422 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3100, loss[loss=0.1087, beats_loss=0.01166, ecapa_loss=0.0001669, whisper_loss=0.09538, over 21978.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01087, ecapa_loss=0.0001656, whisper_loss=0.092, over 3888381.95 frames. ], batch size: 88, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:45:08,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2059930.0, ans=0.125 2024-08-13 07:45:37,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2060130.0, ans=0.125 2024-08-13 07:45:37,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2060130.0, ans=0.2 2024-08-13 07:45:45,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2060130.0, ans=0.1 2024-08-13 07:46:00,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2060230.0, ans=0.0 2024-08-13 07:46:04,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2060230.0, ans=0.125 2024-08-13 07:46:13,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-13 07:46:15,614 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-13 07:46:15,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2060330.0, ans=0.07 2024-08-13 07:46:22,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.355e+01 2.648e+01 2.914e+01 4.175e+01, threshold=5.296e+01, percent-clipped=0.0 2024-08-13 07:46:29,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-13 07:46:30,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3150, loss[loss=0.1313, beats_loss=0.008426, ecapa_loss=0.000232, whisper_loss=0.1206, over 17544.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0109, ecapa_loss=0.0001663, whisper_loss=0.09247, over 3902086.42 frames. ], batch size: 71, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:46:36,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2060430.0, ans=0.0 2024-08-13 07:46:42,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2060430.0, ans=0.1 2024-08-13 07:46:51,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2024-08-13 07:47:06,202 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 07:47:11,297 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 07:47:16,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2060630.0, ans=0.0 2024-08-13 07:47:18,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-13 07:47:28,857 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 07:47:41,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2060830.0, ans=10.0 2024-08-13 07:47:58,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2060930.0, ans=0.125 2024-08-13 07:47:59,469 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3200, loss[loss=0.109, beats_loss=0.009772, ecapa_loss=0.0001446, whisper_loss=0.09775, over 20464.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001657, whisper_loss=0.09164, over 3886278.92 frames. ], batch size: 78, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:48:07,468 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-13 07:48:15,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2060930.0, ans=0.0 2024-08-13 07:48:16,751 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-13 07:48:32,092 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 07:48:38,754 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 07:48:39,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-13 07:48:43,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2061130.0, ans=0.09899494936611666 2024-08-13 07:49:01,816 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 07:49:02,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2061230.0, ans=0.125 2024-08-13 07:49:13,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2061330.0, ans=0.1 2024-08-13 07:49:16,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2061330.0, ans=0.125 2024-08-13 07:49:17,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.368e+01 2.637e+01 3.039e+01 1.272e+02, threshold=5.274e+01, percent-clipped=1.0 2024-08-13 07:49:17,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2061330.0, ans=0.0 2024-08-13 07:49:19,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2061330.0, ans=0.125 2024-08-13 07:49:25,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3250, loss[loss=0.1162, beats_loss=0.008995, ecapa_loss=0.0002106, whisper_loss=0.1051, over 21619.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01075, ecapa_loss=0.0001669, whisper_loss=0.0926, over 3867377.61 frames. ], batch size: 88, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:49:31,781 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 07:49:35,789 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 07:49:55,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2024-08-13 07:50:39,533 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 35 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 07:50:49,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2061930.0, ans=0.125 2024-08-13 07:50:50,037 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3300, loss[loss=0.1046, beats_loss=0.0123, ecapa_loss=0.000136, whisper_loss=0.09095, over 22354.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01069, ecapa_loss=0.0001651, whisper_loss=0.09296, over 3857017.49 frames. ], batch size: 89, lr: 4.23e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:51:13,483 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-13 07:51:43,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2062230.0, ans=0.125 2024-08-13 07:51:51,142 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 07:51:56,874 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 07:52:07,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 07:52:08,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.435e+01 2.813e+01 3.312e+01 6.245e+01, threshold=5.626e+01, percent-clipped=3.0 2024-08-13 07:52:15,991 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3350, loss[loss=0.08509, beats_loss=0.01278, ecapa_loss=0.0001891, whisper_loss=0.07041, over 18869.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01071, ecapa_loss=0.0001652, whisper_loss=0.09328, over 3864428.06 frames. ], batch size: 79, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:52:20,638 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 07:52:27,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2062430.0, ans=0.0 2024-08-13 07:52:36,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-13 07:53:06,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2024-08-13 07:53:10,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2062730.0, ans=0.1 2024-08-13 07:53:20,373 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 07:53:31,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2062830.0, ans=0.125 2024-08-13 07:53:38,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3400, loss[loss=0.1229, beats_loss=0.01084, ecapa_loss=0.0001901, whisper_loss=0.1101, over 21418.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01079, ecapa_loss=0.0001642, whisper_loss=0.09244, over 3876204.29 frames. ], batch size: 84, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:53:57,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2063030.0, ans=0.125 2024-08-13 07:54:02,669 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 07:54:15,718 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 07:54:19,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2024-08-13 07:54:44,187 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-13 07:54:47,926 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 07:54:53,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.911e+01 2.342e+01 2.542e+01 2.769e+01 4.852e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-13 07:54:53,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2063330.0, ans=10.0 2024-08-13 07:55:00,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3450, loss[loss=0.09752, beats_loss=0.01106, ecapa_loss=0.0001704, whisper_loss=0.08475, over 19677.00 frames. ], tot_loss[loss=0.104, beats_loss=0.0108, ecapa_loss=0.0001649, whisper_loss=0.09154, over 3877210.07 frames. ], batch size: 82, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:55:06,329 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 07:55:17,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2063530.0, ans=0.125 2024-08-13 07:55:23,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2024-08-13 07:55:24,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2063530.0, ans=0.125 2024-08-13 07:55:37,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2063630.0, ans=0.125 2024-08-13 07:55:44,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2063630.0, ans=0.125 2024-08-13 07:55:50,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2063730.0, ans=0.2 2024-08-13 07:55:52,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2063730.0, ans=0.5 2024-08-13 07:55:57,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2063730.0, ans=0.125 2024-08-13 07:56:13,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2063830.0, ans=0.0 2024-08-13 07:56:22,711 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3500, loss[loss=0.08833, beats_loss=0.01273, ecapa_loss=0.0001501, whisper_loss=0.0741, over 22676.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001641, whisper_loss=0.09177, over 3901939.73 frames. ], batch size: 92, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:56:26,230 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-13 07:57:06,492 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 07:57:06,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2064130.0, ans=0.05 2024-08-13 07:57:09,750 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 16 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 07:57:10,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2064130.0, ans=0.2 2024-08-13 07:57:11,026 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 17 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 07:57:16,104 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 07:57:37,543 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.517e+01 2.798e+01 3.148e+01 5.290e+01, threshold=5.596e+01, percent-clipped=1.0 2024-08-13 07:57:40,831 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 07:57:45,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2064430.0, ans=0.0 2024-08-13 07:57:47,224 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3550, loss[loss=0.0992, beats_loss=0.0118, ecapa_loss=0.0001845, whisper_loss=0.08556, over 19278.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01074, ecapa_loss=0.000165, whisper_loss=0.09154, over 3893107.13 frames. ], batch size: 81, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:57:48,742 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 29 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 07:57:48,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2064430.0, ans=10.0 2024-08-13 07:57:48,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2064430.0, ans=0.0 2024-08-13 07:57:54,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.63 vs. limit=22.5 2024-08-13 07:57:57,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2064430.0, ans=0.125 2024-08-13 07:58:11,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2064530.0, ans=0.125 2024-08-13 07:58:14,958 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 18 from LS+wenet, 36 from Vox, 39 fro AS 2024-08-13 07:58:20,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2064630.0, ans=0.125 2024-08-13 07:58:47,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2064730.0, ans=0.0 2024-08-13 07:59:11,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3600, loss[loss=0.1044, beats_loss=0.009683, ecapa_loss=0.0001546, whisper_loss=0.09318, over 23446.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01073, ecapa_loss=0.0001647, whisper_loss=0.09144, over 3897975.63 frames. ], batch size: 91, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 07:59:13,413 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-13 07:59:40,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2065030.0, ans=0.0 2024-08-13 07:59:49,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2065130.0, ans=0.125 2024-08-13 07:59:51,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2065130.0, ans=0.1 2024-08-13 08:00:01,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2065230.0, ans=0.0 2024-08-13 08:00:11,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.87 vs. limit=10.0 2024-08-13 08:00:17,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2024-08-13 08:00:27,254 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.371e+01 2.703e+01 3.040e+01 5.839e+01, threshold=5.406e+01, percent-clipped=1.0 2024-08-13 08:00:36,740 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3650, loss[loss=0.1019, beats_loss=0.01235, ecapa_loss=0.0001819, whisper_loss=0.08775, over 21169.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01073, ecapa_loss=0.0001647, whisper_loss=0.09129, over 3865807.38 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:00:39,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2065430.0, ans=0.1 2024-08-13 08:00:39,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2024-08-13 08:00:51,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2065430.0, ans=0.1 2024-08-13 08:01:06,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-13 08:01:07,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2065530.0, ans=0.125 2024-08-13 08:01:14,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2065630.0, ans=0.0 2024-08-13 08:01:19,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.81 vs. limit=22.5 2024-08-13 08:01:23,275 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 15 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 08:01:23,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2065630.0, ans=0.125 2024-08-13 08:01:24,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.25 vs. limit=10.0 2024-08-13 08:01:32,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2065730.0, ans=0.125 2024-08-13 08:01:37,840 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 08:01:38,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2065730.0, ans=0.0 2024-08-13 08:01:53,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2065830.0, ans=0.1 2024-08-13 08:02:01,028 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3700, loss[loss=0.1007, beats_loss=0.01099, ecapa_loss=0.0001872, whisper_loss=0.08782, over 22025.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001648, whisper_loss=0.09173, over 3872462.18 frames. ], batch size: 93, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:02:21,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-08-13 08:02:38,982 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-13 08:02:41,907 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-13 08:02:55,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2066230.0, ans=10.0 2024-08-13 08:02:56,230 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 08:02:57,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2066230.0, ans=0.0 2024-08-13 08:03:10,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2066330.0, ans=0.125 2024-08-13 08:03:13,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.360e+01 2.624e+01 2.875e+01 4.532e+01, threshold=5.249e+01, percent-clipped=0.0 2024-08-13 08:03:19,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2066430.0, ans=0.125 2024-08-13 08:03:20,804 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3750, loss[loss=0.09458, beats_loss=0.01162, ecapa_loss=0.0001622, whisper_loss=0.08134, over 18524.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01087, ecapa_loss=0.0001637, whisper_loss=0.09142, over 3867212.67 frames. ], batch size: 75, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:03:46,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2066530.0, ans=0.0 2024-08-13 08:03:48,379 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 08:03:52,856 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 08:03:59,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-08-13 08:04:02,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2066630.0, ans=0.125 2024-08-13 08:04:02,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-13 08:04:31,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2066830.0, ans=0.1 2024-08-13 08:04:43,653 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3800, loss[loss=0.0947, beats_loss=0.01066, ecapa_loss=0.0001691, whisper_loss=0.08235, over 16353.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001648, whisper_loss=0.09096, over 3849890.17 frames. ], batch size: 63, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:05:04,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2024-08-13 08:05:05,629 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 08:05:06,913 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 37 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 08:05:11,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2067030.0, ans=0.0 2024-08-13 08:05:12,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2024-08-13 08:05:27,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2067130.0, ans=0.1 2024-08-13 08:05:31,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2067230.0, ans=0.125 2024-08-13 08:05:43,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2067230.0, ans=0.1 2024-08-13 08:05:49,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2067330.0, ans=0.1 2024-08-13 08:05:49,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2067330.0, ans=0.0 2024-08-13 08:05:49,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2067330.0, ans=0.125 2024-08-13 08:05:55,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+01 2.450e+01 2.726e+01 3.001e+01 5.077e+01, threshold=5.452e+01, percent-clipped=0.0 2024-08-13 08:06:03,508 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3850, loss[loss=0.1118, beats_loss=0.01195, ecapa_loss=0.0001626, whisper_loss=0.09823, over 20156.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01098, ecapa_loss=0.0001631, whisper_loss=0.09111, over 3848551.94 frames. ], batch size: 80, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:06:10,910 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-13 08:06:53,595 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 08:06:53,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2067730.0, ans=0.125 2024-08-13 08:07:11,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2067830.0, ans=0.2 2024-08-13 08:07:11,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2067830.0, ans=0.1 2024-08-13 08:07:25,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2067830.0, ans=0.125 2024-08-13 08:07:26,916 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 39 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 08:07:29,592 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3900, loss[loss=0.1344, beats_loss=0.009174, ecapa_loss=0.0001985, whisper_loss=0.1233, over 21769.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01091, ecapa_loss=0.0001645, whisper_loss=0.09258, over 3894081.76 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:07:40,383 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 08:07:50,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-08-13 08:08:05,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2068130.0, ans=0.07 2024-08-13 08:08:10,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2068130.0, ans=0.125 2024-08-13 08:08:33,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2068330.0, ans=0.125 2024-08-13 08:08:43,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.060e+01 2.513e+01 2.786e+01 3.251e+01 6.128e+01, threshold=5.571e+01, percent-clipped=2.0 2024-08-13 08:08:52,105 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 3950, loss[loss=0.09018, beats_loss=0.01025, ecapa_loss=0.000217, whisper_loss=0.07776, over 15820.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.01085, ecapa_loss=0.0001662, whisper_loss=0.09304, over 3905039.31 frames. ], batch size: 68, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:08:52,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2068430.0, ans=10.0 2024-08-13 08:08:58,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2068430.0, ans=0.125 2024-08-13 08:08:58,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2024-08-13 08:09:51,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2068730.0, ans=0.0 2024-08-13 08:10:00,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2068730.0, ans=0.125 2024-08-13 08:10:02,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2024-08-13 08:10:13,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2068830.0, ans=0.125 2024-08-13 08:10:20,832 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4000, loss[loss=0.1231, beats_loss=0.008485, ecapa_loss=0.0001944, whisper_loss=0.1126, over 22482.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01085, ecapa_loss=0.0001663, whisper_loss=0.09289, over 3915501.31 frames. ], batch size: 88, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:10:20,972 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 08:10:21,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2068930.0, ans=0.125 2024-08-13 08:10:42,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2069030.0, ans=0.125 2024-08-13 08:10:51,349 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 32 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 08:10:54,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2069130.0, ans=0.125 2024-08-13 08:10:54,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2069130.0, ans=0.125 2024-08-13 08:11:06,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2069130.0, ans=0.125 2024-08-13 08:11:16,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2024-08-13 08:11:21,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2069230.0, ans=0.025 2024-08-13 08:11:25,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2069330.0, ans=0.125 2024-08-13 08:11:32,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2069330.0, ans=0.09899494936611666 2024-08-13 08:11:34,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.360e+01 2.605e+01 2.925e+01 4.033e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 08:11:37,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.98 vs. limit=6.0 2024-08-13 08:11:38,428 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 08:11:43,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4050, loss[loss=0.1071, beats_loss=0.008914, ecapa_loss=0.0001828, whisper_loss=0.09639, over 19926.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.0108, ecapa_loss=0.0001661, whisper_loss=0.09281, over 3889392.29 frames. ], batch size: 78, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:11:48,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2069430.0, ans=0.125 2024-08-13 08:12:54,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2069830.0, ans=0.125 2024-08-13 08:13:08,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2069930.0, ans=0.125 2024-08-13 08:13:09,746 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4100, loss[loss=0.1153, beats_loss=0.009813, ecapa_loss=0.0001835, whisper_loss=0.1037, over 20246.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01079, ecapa_loss=0.0001669, whisper_loss=0.09266, over 3876073.59 frames. ], batch size: 79, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:13:13,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2069930.0, ans=0.1 2024-08-13 08:13:15,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2024-08-13 08:13:21,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2069930.0, ans=0.125 2024-08-13 08:13:39,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2070030.0, ans=0.0 2024-08-13 08:13:42,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2024-08-13 08:13:51,228 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 08:14:05,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2070230.0, ans=0.125 2024-08-13 08:14:11,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2070230.0, ans=0.0 2024-08-13 08:14:17,429 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 08:14:17,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-08-13 08:14:17,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-13 08:14:20,042 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 16 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-13 08:14:24,745 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.381e+01 2.702e+01 3.113e+01 4.589e+01, threshold=5.403e+01, percent-clipped=0.0 2024-08-13 08:14:33,753 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4150, loss[loss=0.1236, beats_loss=0.01129, ecapa_loss=0.0001463, whisper_loss=0.1109, over 22085.00 frames. ], tot_loss[loss=0.1058, beats_loss=0.01078, ecapa_loss=0.0001669, whisper_loss=0.09338, over 3886541.97 frames. ], batch size: 86, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:14:35,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2070430.0, ans=0.0 2024-08-13 08:14:40,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2070430.0, ans=0.1 2024-08-13 08:14:46,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2070430.0, ans=0.125 2024-08-13 08:14:48,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2070530.0, ans=0.0 2024-08-13 08:14:50,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2024-08-13 08:15:17,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2070630.0, ans=0.95 2024-08-13 08:15:19,338 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 08:15:24,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2070730.0, ans=0.125 2024-08-13 08:15:36,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2024-08-13 08:15:50,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-13 08:15:56,113 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4200, loss[loss=0.1105, beats_loss=0.009968, ecapa_loss=0.0001842, whisper_loss=0.09866, over 20217.00 frames. ], tot_loss[loss=0.1064, beats_loss=0.01078, ecapa_loss=0.0001669, whisper_loss=0.09392, over 3914416.27 frames. ], batch size: 84, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:16:05,322 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 08:16:05,835 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.306e-02 2024-08-13 08:16:13,530 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-13 08:16:29,505 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 08:16:38,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2071130.0, ans=0.1 2024-08-13 08:16:41,317 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 08:16:59,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2071230.0, ans=0.125 2024-08-13 08:17:00,908 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-13 08:17:10,990 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.330e+01 2.608e+01 3.052e+01 6.792e+01, threshold=5.217e+01, percent-clipped=3.0 2024-08-13 08:17:18,443 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4250, loss[loss=0.1334, beats_loss=0.008269, ecapa_loss=0.0001814, whisper_loss=0.1233, over 22436.00 frames. ], tot_loss[loss=0.1059, beats_loss=0.01083, ecapa_loss=0.0001657, whisper_loss=0.09345, over 3935092.16 frames. ], batch size: 89, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:17:26,430 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:17:33,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2071430.0, ans=0.0 2024-08-13 08:17:38,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2071530.0, ans=0.1 2024-08-13 08:17:38,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2071530.0, ans=0.05 2024-08-13 08:17:49,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-13 08:18:02,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-13 08:18:10,647 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 18 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-13 08:18:17,838 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 14 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 08:18:26,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2024-08-13 08:18:27,027 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 08:18:30,074 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-13 08:18:37,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2071830.0, ans=15.0 2024-08-13 08:18:40,849 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4300, loss[loss=0.105, beats_loss=0.00908, ecapa_loss=0.0001884, whisper_loss=0.09403, over 21418.00 frames. ], tot_loss[loss=0.105, beats_loss=0.01085, ecapa_loss=0.0001649, whisper_loss=0.09253, over 3876106.19 frames. ], batch size: 85, lr: 4.22e-03, grad_scale: 2.8823037615171174e+17 2024-08-13 08:18:44,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2071930.0, ans=0.125 2024-08-13 08:18:46,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2071930.0, ans=0.0 2024-08-13 08:18:53,457 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 08:19:09,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-13 08:19:23,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-08-13 08:19:53,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.447e+01 2.714e+01 2.965e+01 4.296e+01, threshold=5.429e+01, percent-clipped=0.0 2024-08-13 08:19:56,698 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-13 08:19:58,099 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-13 08:20:00,778 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4350, loss[loss=0.08563, beats_loss=0.01179, ecapa_loss=0.0001727, whisper_loss=0.07212, over 21819.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01077, ecapa_loss=0.0001668, whisper_loss=0.09168, over 3839994.94 frames. ], batch size: 91, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:20:01,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2024-08-13 08:20:21,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2072530.0, ans=0.015 2024-08-13 08:20:37,869 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.675e-01 2024-08-13 08:21:07,343 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-13 08:21:11,504 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 08:21:19,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2072830.0, ans=0.0 2024-08-13 08:21:19,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2072830.0, ans=0.0 2024-08-13 08:21:23,790 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4400, loss[loss=0.0919, beats_loss=0.01259, ecapa_loss=0.0001541, whisper_loss=0.07777, over 21991.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01079, ecapa_loss=0.0001681, whisper_loss=0.09151, over 3857897.09 frames. ], batch size: 90, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:21:38,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.20 vs. limit=15.0 2024-08-13 08:21:54,054 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 08:21:59,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2073130.0, ans=0.125 2024-08-13 08:22:15,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2073230.0, ans=0.125 2024-08-13 08:22:30,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2073330.0, ans=0.125 2024-08-13 08:22:40,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.445e+01 2.799e+01 3.126e+01 5.864e+01, threshold=5.599e+01, percent-clipped=1.0 2024-08-13 08:22:42,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=12.0 2024-08-13 08:22:47,155 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4450, loss[loss=0.105, beats_loss=0.01274, ecapa_loss=0.0001687, whisper_loss=0.09063, over 20521.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01071, ecapa_loss=0.0001674, whisper_loss=0.09173, over 3868132.12 frames. ], batch size: 84, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:22:53,936 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-13 08:23:14,320 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 08:23:23,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2073630.0, ans=0.125 2024-08-13 08:23:29,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-08-13 08:23:35,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2073730.0, ans=0.125 2024-08-13 08:23:45,124 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 08:23:55,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-08-13 08:23:59,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2024-08-13 08:24:03,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2073830.0, ans=0.0 2024-08-13 08:24:07,759 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4500, loss[loss=0.0809, beats_loss=0.01483, ecapa_loss=8.641e-05, whisper_loss=0.06521, over 20899.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0108, ecapa_loss=0.0001659, whisper_loss=0.09128, over 3888422.00 frames. ], batch size: 79, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:24:28,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2074030.0, ans=0.125 2024-08-13 08:24:43,305 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-13 08:24:55,515 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-13 08:24:57,507 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 08:24:58,708 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 08:25:00,147 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 20 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 08:25:06,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-08-13 08:25:15,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.370e+01 2.670e+01 3.024e+01 4.135e+01, threshold=5.340e+01, percent-clipped=0.0 2024-08-13 08:25:23,079 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4550, loss[loss=0.1204, beats_loss=0.01031, ecapa_loss=0.0001862, whisper_loss=0.1082, over 22817.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001667, whisper_loss=0.09178, over 3904808.21 frames. ], batch size: 94, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:25:48,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2024-08-13 08:25:52,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2074630.0, ans=0.125 2024-08-13 08:26:01,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-08-13 08:26:02,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2074630.0, ans=0.2 2024-08-13 08:26:07,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2074730.0, ans=0.0 2024-08-13 08:26:08,578 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 08:26:10,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2074730.0, ans=0.125 2024-08-13 08:26:10,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-13 08:26:11,612 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 08:26:13,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2074730.0, ans=0.0 2024-08-13 08:26:34,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4600, loss[loss=0.1137, beats_loss=0.01054, ecapa_loss=0.0001635, whisper_loss=0.1016, over 22069.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.000165, whisper_loss=0.09203, over 3904007.93 frames. ], batch size: 89, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:26:56,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2075030.0, ans=0.125 2024-08-13 08:27:05,705 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 08:27:08,942 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 28 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 08:27:30,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-13 08:27:34,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2075330.0, ans=0.125 2024-08-13 08:27:36,000 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 08:27:39,565 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 08:27:41,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-13 08:27:42,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.369e+01 2.617e+01 2.923e+01 4.349e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-13 08:27:46,701 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 08:27:49,120 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4650, loss[loss=0.09333, beats_loss=0.01071, ecapa_loss=0.0001414, whisper_loss=0.0812, over 17964.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01077, ecapa_loss=0.000166, whisper_loss=0.09183, over 3909143.22 frames. ], batch size: 70, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:27:53,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2075430.0, ans=0.125 2024-08-13 08:28:07,561 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-13 08:28:08,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-13 08:28:32,337 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 20 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-13 08:28:48,756 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-13 08:28:53,695 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 08:29:04,535 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4700, loss[loss=0.1114, beats_loss=0.009754, ecapa_loss=0.0002081, whisper_loss=0.09953, over 19527.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01076, ecapa_loss=0.0001648, whisper_loss=0.09218, over 3901517.89 frames. ], batch size: 80, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:29:09,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2075930.0, ans=0.0 2024-08-13 08:29:13,237 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 08:29:29,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2076030.0, ans=0.125 2024-08-13 08:29:50,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2076230.0, ans=0.125 2024-08-13 08:29:54,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2076230.0, ans=0.125 2024-08-13 08:30:08,611 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-13 08:30:13,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.493e+01 2.764e+01 3.080e+01 1.960e+02, threshold=5.528e+01, percent-clipped=2.0 2024-08-13 08:30:13,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2076330.0, ans=0.0 2024-08-13 08:30:20,467 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4750, loss[loss=0.1195, beats_loss=0.007457, ecapa_loss=0.0002004, whisper_loss=0.11, over 21300.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01087, ecapa_loss=0.0001644, whisper_loss=0.09159, over 3896984.23 frames. ], batch size: 86, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:30:20,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2076430.0, ans=0.0 2024-08-13 08:30:21,918 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 08:30:31,126 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 08:30:32,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2076430.0, ans=0.0 2024-08-13 08:30:41,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2076530.0, ans=0.125 2024-08-13 08:31:07,492 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 38 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-13 08:31:11,962 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 08:31:14,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2076730.0, ans=0.125 2024-08-13 08:31:16,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-13 08:31:28,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2076830.0, ans=0.125 2024-08-13 08:31:34,484 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4800, loss[loss=0.08565, beats_loss=0.01073, ecapa_loss=0.0001882, whisper_loss=0.07304, over 13878.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01079, ecapa_loss=0.0001656, whisper_loss=0.09207, over 3903532.03 frames. ], batch size: 57, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:31:36,125 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 08:31:36,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2076930.0, ans=0.0 2024-08-13 08:31:56,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=12.0 2024-08-13 08:32:18,363 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 08:32:29,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2077230.0, ans=0.2 2024-08-13 08:32:42,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.414e+01 2.705e+01 2.995e+01 6.816e+01, threshold=5.410e+01, percent-clipped=1.0 2024-08-13 08:32:49,597 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4850, loss[loss=0.1219, beats_loss=0.009503, ecapa_loss=0.0001668, whisper_loss=0.1107, over 23147.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001655, whisper_loss=0.09133, over 3914712.15 frames. ], batch size: 91, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:32:51,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2077430.0, ans=0.125 2024-08-13 08:33:02,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2077530.0, ans=0.1 2024-08-13 08:33:04,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2077530.0, ans=0.0 2024-08-13 08:33:21,457 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-13 08:33:24,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2024-08-13 08:33:27,367 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 08:33:43,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-13 08:33:50,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2077830.0, ans=0.125 2024-08-13 08:34:02,157 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4900, loss[loss=0.07967, beats_loss=0.01001, ecapa_loss=0.0002184, whisper_loss=0.06747, over 16568.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01075, ecapa_loss=0.0001662, whisper_loss=0.09231, over 3907282.41 frames. ], batch size: 70, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:34:03,615 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 08:34:23,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2078030.0, ans=0.0 2024-08-13 08:34:29,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.08 vs. limit=22.5 2024-08-13 08:34:43,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2078230.0, ans=0.0 2024-08-13 08:34:46,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2078230.0, ans=0.125 2024-08-13 08:35:02,040 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 08:35:06,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.464e+01 2.767e+01 3.041e+01 1.306e+02, threshold=5.534e+01, percent-clipped=2.0 2024-08-13 08:35:06,266 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 08:35:10,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2078330.0, ans=0.125 2024-08-13 08:35:12,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 4950, loss[loss=0.1119, beats_loss=0.01018, ecapa_loss=0.0001514, whisper_loss=0.1002, over 23961.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001652, whisper_loss=0.09169, over 3883935.60 frames. ], batch size: 93, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:35:15,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2078430.0, ans=0.125 2024-08-13 08:35:17,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-13 08:35:19,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.39 vs. limit=15.0 2024-08-13 08:35:26,947 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 08:35:31,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2078530.0, ans=0.1 2024-08-13 08:35:36,693 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 17 from LS+wenet, 26 from Vox, 50 fro AS 2024-08-13 08:35:54,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2024-08-13 08:35:56,907 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 08:36:00,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2078730.0, ans=0.125 2024-08-13 08:36:15,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2078830.0, ans=0.125 2024-08-13 08:36:15,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2078830.0, ans=0.1 2024-08-13 08:36:16,234 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 08:36:22,752 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5000, loss[loss=0.09295, beats_loss=0.01162, ecapa_loss=0.0001775, whisper_loss=0.07955, over 15521.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01081, ecapa_loss=0.0001656, whisper_loss=0.09168, over 3857118.55 frames. ], batch size: 66, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:36:39,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2079030.0, ans=0.125 2024-08-13 08:36:42,345 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.002e+01 2024-08-13 08:36:54,117 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 08:37:10,489 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 35 from Vox, 34 fro AS 2024-08-13 08:37:10,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2079230.0, ans=0.125 2024-08-13 08:37:16,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2079330.0, ans=0.125 2024-08-13 08:37:17,277 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 08:37:24,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.436e+01 2.730e+01 3.076e+01 4.220e+01, threshold=5.460e+01, percent-clipped=0.0 2024-08-13 08:37:30,809 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5050, loss[loss=0.09845, beats_loss=0.01057, ecapa_loss=0.0001545, whisper_loss=0.08633, over 22933.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.000166, whisper_loss=0.09132, over 3869397.53 frames. ], batch size: 93, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:37:33,562 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 08:37:45,496 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.562e+05 2024-08-13 08:37:49,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2079530.0, ans=0.125 2024-08-13 08:37:50,532 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 08:38:08,142 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 08:38:32,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2079830.0, ans=0.2 2024-08-13 08:38:37,794 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5100, loss[loss=0.1035, beats_loss=0.01123, ecapa_loss=0.0001473, whisper_loss=0.09079, over 21015.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0109, ecapa_loss=0.000165, whisper_loss=0.09188, over 3885746.70 frames. ], batch size: 84, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:38:38,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2079930.0, ans=0.125 2024-08-13 08:38:45,871 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-208000.pt 2024-08-13 08:38:53,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2080030.0, ans=0.125 2024-08-13 08:39:06,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2080130.0, ans=0.1 2024-08-13 08:39:10,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-08-13 08:39:19,960 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 08:39:34,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2080330.0, ans=0.1 2024-08-13 08:39:41,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.313e+01 2.678e+01 2.870e+01 5.220e+01, threshold=5.355e+01, percent-clipped=0.0 2024-08-13 08:39:42,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2080330.0, ans=0.125 2024-08-13 08:39:48,481 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5150, loss[loss=0.1083, beats_loss=0.009495, ecapa_loss=0.0001559, whisper_loss=0.09727, over 22916.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001651, whisper_loss=0.09177, over 3879751.80 frames. ], batch size: 90, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:39:56,912 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 12 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 08:40:06,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2080530.0, ans=0.125 2024-08-13 08:40:12,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2080530.0, ans=0.125 2024-08-13 08:40:17,400 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 08:40:21,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2024-08-13 08:40:24,271 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-13 08:40:28,252 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 08:40:32,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2080730.0, ans=0.07 2024-08-13 08:40:49,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2080830.0, ans=0.125 2024-08-13 08:40:56,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2080930.0, ans=0.125 2024-08-13 08:40:57,273 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5200, loss[loss=0.1002, beats_loss=0.01107, ecapa_loss=0.0001945, whisper_loss=0.08716, over 21468.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01089, ecapa_loss=0.0001645, whisper_loss=0.09185, over 3914139.55 frames. ], batch size: 88, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:41:20,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2081030.0, ans=0.1 2024-08-13 08:41:26,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2024-08-13 08:41:45,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2081230.0, ans=0.125 2024-08-13 08:41:59,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.338e+01 2.575e+01 2.873e+01 5.976e+01, threshold=5.150e+01, percent-clipped=1.0 2024-08-13 08:42:06,555 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5250, loss[loss=0.1147, beats_loss=0.008521, ecapa_loss=0.0001599, whisper_loss=0.1046, over 15628.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.000165, whisper_loss=0.09159, over 3869848.43 frames. ], batch size: 61, lr: 4.21e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:42:27,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2081530.0, ans=0.0 2024-08-13 08:42:28,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2081530.0, ans=0.125 2024-08-13 08:42:37,512 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 20 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-13 08:42:42,883 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 13 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 08:42:46,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2024-08-13 08:43:13,469 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 08:43:13,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2081930.0, ans=0.5 2024-08-13 08:43:14,610 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5300, loss[loss=0.09562, beats_loss=0.01078, ecapa_loss=0.0001428, whisper_loss=0.08342, over 16569.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01084, ecapa_loss=0.0001647, whisper_loss=0.09164, over 3869525.64 frames. ], batch size: 65, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:43:21,774 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 08:43:27,220 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-13 08:43:41,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2082130.0, ans=0.09899494936611666 2024-08-13 08:43:49,047 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 12 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 08:43:53,480 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 08:43:53,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2082130.0, ans=0.2 2024-08-13 08:44:12,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2024-08-13 08:44:13,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2082330.0, ans=0.125 2024-08-13 08:44:17,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.493e+01 2.716e+01 3.005e+01 4.281e+01, threshold=5.431e+01, percent-clipped=0.0 2024-08-13 08:44:24,058 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5350, loss[loss=0.08669, beats_loss=0.01365, ecapa_loss=0.0001419, whisper_loss=0.07161, over 21248.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01084, ecapa_loss=0.0001646, whisper_loss=0.09152, over 3859111.55 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:44:52,566 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 27 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-13 08:44:56,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-08-13 08:44:57,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2082630.0, ans=0.2 2024-08-13 08:45:10,599 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-13 08:45:12,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2082730.0, ans=0.1 2024-08-13 08:45:19,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2082830.0, ans=0.125 2024-08-13 08:45:32,491 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5400, loss[loss=0.128, beats_loss=0.00912, ecapa_loss=0.0001767, whisper_loss=0.1171, over 23410.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0108, ecapa_loss=0.0001651, whisper_loss=0.09184, over 3885454.92 frames. ], batch size: 91, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:45:46,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2083030.0, ans=0.0 2024-08-13 08:45:59,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2083130.0, ans=0.1 2024-08-13 08:46:02,415 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 08:46:12,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2083230.0, ans=0.09899494936611666 2024-08-13 08:46:18,342 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 08:46:19,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.00 vs. limit=22.5 2024-08-13 08:46:26,486 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 26 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 08:46:26,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2083330.0, ans=0.1 2024-08-13 08:46:29,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2083330.0, ans=0.125 2024-08-13 08:46:34,063 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.479e+01 2.684e+01 3.109e+01 1.549e+02, threshold=5.369e+01, percent-clipped=2.0 2024-08-13 08:46:38,464 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-13 08:46:38,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2083330.0, ans=0.125 2024-08-13 08:46:40,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5450, loss[loss=0.1053, beats_loss=0.0109, ecapa_loss=0.0001764, whisper_loss=0.09264, over 21874.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001664, whisper_loss=0.09148, over 3893033.54 frames. ], batch size: 87, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:46:45,099 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 08:46:59,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2083530.0, ans=0.1 2024-08-13 08:47:03,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2083530.0, ans=0.1 2024-08-13 08:47:13,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-13 08:47:14,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-08-13 08:47:16,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=15.0 2024-08-13 08:47:26,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.15 vs. limit=10.0 2024-08-13 08:47:32,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2083730.0, ans=0.2 2024-08-13 08:47:35,715 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 08:47:40,458 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 08:47:46,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2083830.0, ans=0.0 2024-08-13 08:47:48,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2083830.0, ans=0.125 2024-08-13 08:47:59,009 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5500, loss[loss=0.1213, beats_loss=0.009382, ecapa_loss=0.0001854, whisper_loss=0.1101, over 17362.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01089, ecapa_loss=0.0001645, whisper_loss=0.09088, over 3892537.65 frames. ], batch size: 69, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:47:59,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2083930.0, ans=0.09899494936611666 2024-08-13 08:48:04,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2083930.0, ans=0.2 2024-08-13 08:48:04,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-08-13 08:48:10,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2083930.0, ans=0.0 2024-08-13 08:48:11,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2084030.0, ans=0.0 2024-08-13 08:48:25,747 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 08:48:33,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2084130.0, ans=0.125 2024-08-13 08:48:38,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2084130.0, ans=0.0 2024-08-13 08:49:02,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2084230.0, ans=0.04949747468305833 2024-08-13 08:49:16,872 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.483e+01 2.752e+01 3.066e+01 5.816e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-13 08:49:22,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2084330.0, ans=0.125 2024-08-13 08:49:26,432 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5550, loss[loss=0.1086, beats_loss=0.01064, ecapa_loss=0.0001931, whisper_loss=0.09602, over 22221.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01092, ecapa_loss=0.0001641, whisper_loss=0.09096, over 3929281.10 frames. ], batch size: 95, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:49:38,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2084430.0, ans=0.125 2024-08-13 08:49:38,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2084430.0, ans=0.125 2024-08-13 08:49:42,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2024-08-13 08:49:57,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2084530.0, ans=0.125 2024-08-13 08:50:00,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2084530.0, ans=0.0 2024-08-13 08:50:33,806 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 08:50:41,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2024-08-13 08:50:59,710 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5600, loss[loss=0.1132, beats_loss=0.01259, ecapa_loss=0.0001767, whisper_loss=0.09884, over 22746.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001652, whisper_loss=0.0919, over 3939120.28 frames. ], batch size: 93, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:51:06,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2084930.0, ans=0.0 2024-08-13 08:51:11,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2084930.0, ans=0.125 2024-08-13 08:51:12,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2084930.0, ans=0.125 2024-08-13 08:51:25,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2085030.0, ans=0.0 2024-08-13 08:51:34,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2024-08-13 08:51:53,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2085130.0, ans=0.1 2024-08-13 08:52:33,292 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.389e+01 2.717e+01 3.076e+01 5.909e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 08:52:37,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2085330.0, ans=0.1 2024-08-13 08:52:43,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5650, loss[loss=0.09984, beats_loss=0.01206, ecapa_loss=0.0001423, whisper_loss=0.08636, over 22209.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.0001647, whisper_loss=0.09148, over 3937021.08 frames. ], batch size: 88, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:53:28,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2085630.0, ans=0.2 2024-08-13 08:53:43,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2085730.0, ans=0.125 2024-08-13 08:53:53,494 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 08:54:02,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2085830.0, ans=0.125 2024-08-13 08:54:09,739 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 08:54:11,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2085830.0, ans=10.0 2024-08-13 08:54:16,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2085930.0, ans=0.125 2024-08-13 08:54:17,611 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5700, loss[loss=0.1068, beats_loss=0.01119, ecapa_loss=0.0001647, whisper_loss=0.09397, over 21329.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01088, ecapa_loss=0.0001654, whisper_loss=0.09132, over 3946047.48 frames. ], batch size: 83, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:54:18,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2085930.0, ans=0.2 2024-08-13 08:54:23,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2085930.0, ans=0.125 2024-08-13 08:54:26,105 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 30 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 08:54:30,067 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 22 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 08:55:02,238 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-13 08:55:10,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-08-13 08:55:15,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2086230.0, ans=0.125 2024-08-13 08:55:25,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.405e+01 2.655e+01 3.007e+01 4.478e+01, threshold=5.310e+01, percent-clipped=0.0 2024-08-13 08:55:26,990 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-13 08:55:33,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5750, loss[loss=0.1072, beats_loss=0.009997, ecapa_loss=0.0001452, whisper_loss=0.09578, over 19485.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01096, ecapa_loss=0.0001654, whisper_loss=0.09055, over 3939108.27 frames. ], batch size: 77, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:55:38,162 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 08:55:41,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-13 08:56:00,493 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 34 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 08:56:03,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2086630.0, ans=0.0 2024-08-13 08:56:03,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2086630.0, ans=0.0 2024-08-13 08:56:06,614 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.748e+01 2024-08-13 08:56:20,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-13 08:56:21,790 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 08:56:21,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2086730.0, ans=0.0 2024-08-13 08:56:33,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2086730.0, ans=0.0 2024-08-13 08:56:38,880 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 18 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-13 08:56:44,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2086830.0, ans=0.0 2024-08-13 08:56:47,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2086830.0, ans=0.1 2024-08-13 08:56:51,825 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5800, loss[loss=0.125, beats_loss=0.004481, ecapa_loss=0.000211, whisper_loss=0.1184, over 15215.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01092, ecapa_loss=0.0001665, whisper_loss=0.09078, over 3923148.74 frames. ], batch size: 59, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:57:08,780 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-13 08:57:32,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2024-08-13 08:57:39,094 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 08:57:40,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2087230.0, ans=0.0 2024-08-13 08:57:45,342 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 17 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-13 08:57:59,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2087330.0, ans=0.0 2024-08-13 08:58:02,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.414e+01 2.686e+01 3.038e+01 9.495e+01, threshold=5.372e+01, percent-clipped=3.0 2024-08-13 08:58:09,850 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5850, loss[loss=0.1185, beats_loss=0.01032, ecapa_loss=0.0001695, whisper_loss=0.1065, over 21154.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01091, ecapa_loss=0.0001659, whisper_loss=0.0914, over 3914842.59 frames. ], batch size: 84, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:58:14,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2087430.0, ans=0.125 2024-08-13 08:58:14,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-08-13 08:58:39,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2087530.0, ans=0.2 2024-08-13 08:58:42,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2087630.0, ans=0.0 2024-08-13 08:58:57,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2087730.0, ans=0.05 2024-08-13 08:59:05,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2087730.0, ans=0.0 2024-08-13 08:59:05,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2087730.0, ans=0.0 2024-08-13 08:59:11,381 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 08:59:14,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2087830.0, ans=0.0 2024-08-13 08:59:15,519 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 08:59:28,475 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5900, loss[loss=0.1028, beats_loss=0.009882, ecapa_loss=0.0001757, whisper_loss=0.09117, over 17887.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01091, ecapa_loss=0.0001646, whisper_loss=0.09091, over 3859550.57 frames. ], batch size: 71, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 08:59:35,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2087930.0, ans=0.1 2024-08-13 08:59:43,784 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 15 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 08:59:53,083 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 08:59:54,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2088030.0, ans=0.0 2024-08-13 09:00:11,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2088130.0, ans=0.125 2024-08-13 09:00:26,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2024-08-13 09:00:29,720 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-13 09:00:39,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.419e+01 2.634e+01 3.004e+01 5.084e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-13 09:00:45,770 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-13 09:00:47,065 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 5950, loss[loss=0.1173, beats_loss=0.008136, ecapa_loss=0.0002063, whisper_loss=0.1071, over 13191.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01091, ecapa_loss=0.000165, whisper_loss=0.09099, over 3840353.37 frames. ], batch size: 54, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:00:47,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2088430.0, ans=0.2 2024-08-13 09:00:50,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2088430.0, ans=0.1 2024-08-13 09:01:00,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=15.0 2024-08-13 09:01:22,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2088630.0, ans=0.0 2024-08-13 09:01:36,397 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 09:01:38,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2088730.0, ans=0.0 2024-08-13 09:01:43,859 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-13 09:01:46,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-08-13 09:02:06,992 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6000, loss[loss=0.09702, beats_loss=0.01205, ecapa_loss=0.0001423, whisper_loss=0.08354, over 17869.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01094, ecapa_loss=0.0001636, whisper_loss=0.09085, over 3829017.45 frames. ], batch size: 69, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:02:06,994 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 09:02:46,557 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on ASR_libri: loss=0.2545, beats_loss=0, ecapa_loss=0.0005583, whisper_loss=0.2489, over 922467.00 frames. 2024-08-13 09:02:52,668 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6821, 1.7373, 2.2250, 2.3351], device='cuda:0') 2024-08-13 09:03:04,008 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on SV_voxceleb1: loss=0.004508, beats_loss=0, ecapa_loss=0.0004508, whisper_loss=0, over 939242.00 frames. 2024-08-13 09:04:46,979 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0221, 2.8341, 2.7726, 2.5965], device='cuda:0') 2024-08-13 09:05:03,030 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 09:05:03,034 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 09:05:08,576 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-13 09:05:35,892 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-13 09:05:45,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2089130.0, ans=0.125 2024-08-13 09:05:49,920 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 38 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 09:06:09,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2089330.0, ans=0.1 2024-08-13 09:06:12,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.429e+01 2.733e+01 3.006e+01 6.424e+01, threshold=5.466e+01, percent-clipped=1.0 2024-08-13 09:06:12,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2089330.0, ans=0.0 2024-08-13 09:06:12,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.09 vs. limit=10.0 2024-08-13 09:06:14,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-13 09:06:18,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2089430.0, ans=0.0 2024-08-13 09:06:19,847 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6050, loss[loss=0.1163, beats_loss=0.01233, ecapa_loss=0.0001187, whisper_loss=0.1028, over 18821.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01093, ecapa_loss=0.0001632, whisper_loss=0.09111, over 3839666.96 frames. ], batch size: 71, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:06:20,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-08-13 09:06:25,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2089430.0, ans=0.125 2024-08-13 09:06:37,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-08-13 09:06:41,977 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 28 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 09:07:24,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2089830.0, ans=0.0 2024-08-13 09:07:33,363 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 35 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-13 09:07:41,911 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6100, loss[loss=0.1063, beats_loss=0.01105, ecapa_loss=0.0001469, whisper_loss=0.09376, over 20387.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.011, ecapa_loss=0.0001633, whisper_loss=0.09116, over 3853799.35 frames. ], batch size: 79, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:07:45,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2089930.0, ans=0.2 2024-08-13 09:07:48,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2089930.0, ans=0.125 2024-08-13 09:07:55,063 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 09:08:07,500 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 24 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 09:08:10,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2090030.0, ans=0.1 2024-08-13 09:08:11,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-08-13 09:08:36,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2090230.0, ans=0.125 2024-08-13 09:08:41,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2024-08-13 09:08:55,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.306e+01 2.537e+01 2.839e+01 1.271e+02, threshold=5.074e+01, percent-clipped=1.0 2024-08-13 09:08:57,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2090330.0, ans=0.125 2024-08-13 09:09:00,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.82 vs. limit=22.5 2024-08-13 09:09:03,094 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6150, loss[loss=0.09571, beats_loss=0.009826, ecapa_loss=0.0001921, whisper_loss=0.08396, over 22149.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.011, ecapa_loss=0.0001639, whisper_loss=0.09131, over 3860630.79 frames. ], batch size: 90, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:09:09,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2090430.0, ans=0.09899494936611666 2024-08-13 09:09:14,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2090430.0, ans=0.125 2024-08-13 09:09:25,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2090530.0, ans=0.0 2024-08-13 09:09:36,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2090630.0, ans=0.125 2024-08-13 09:09:46,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2090630.0, ans=0.125 2024-08-13 09:09:55,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2090730.0, ans=0.125 2024-08-13 09:10:01,161 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-13 09:10:23,378 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6200, loss[loss=0.1053, beats_loss=0.01267, ecapa_loss=0.0001219, whisper_loss=0.09137, over 23281.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01087, ecapa_loss=0.0001647, whisper_loss=0.09212, over 3859079.62 frames. ], batch size: 88, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:10:48,017 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 09:10:49,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2091030.0, ans=0.04949747468305833 2024-08-13 09:11:04,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.500e+05 2024-08-13 09:11:24,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2091230.0, ans=0.125 2024-08-13 09:11:37,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.446e+01 2.761e+01 3.049e+01 5.001e+01, threshold=5.523e+01, percent-clipped=0.0 2024-08-13 09:11:45,298 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6250, loss[loss=0.09898, beats_loss=0.01155, ecapa_loss=0.0001178, whisper_loss=0.08626, over 18532.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01088, ecapa_loss=0.0001647, whisper_loss=0.0923, over 3884578.99 frames. ], batch size: 70, lr: 4.20e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:11:45,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2091430.0, ans=0.125 2024-08-13 09:11:56,200 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.658e-01 2024-08-13 09:12:01,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2091530.0, ans=0.125 2024-08-13 09:12:11,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-08-13 09:12:19,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-13 09:12:25,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2091630.0, ans=10.0 2024-08-13 09:12:57,807 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.862e-02 2024-08-13 09:13:00,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2091830.0, ans=0.2 2024-08-13 09:13:01,415 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 09:13:01,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=2091830.0, ans=0.2 2024-08-13 09:13:05,962 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6300, loss[loss=0.08774, beats_loss=0.01132, ecapa_loss=0.0001668, whisper_loss=0.07475, over 19603.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01086, ecapa_loss=0.0001653, whisper_loss=0.09216, over 3872249.47 frames. ], batch size: 80, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:13:12,775 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 27 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-13 09:13:31,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2092030.0, ans=0.0 2024-08-13 09:13:36,325 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 09:13:47,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2092130.0, ans=0.0 2024-08-13 09:13:49,028 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-13 09:14:04,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=12.0 2024-08-13 09:14:12,489 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 09:14:15,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2092330.0, ans=0.125 2024-08-13 09:14:16,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.061e+01 2.468e+01 2.785e+01 3.208e+01 1.167e+02, threshold=5.571e+01, percent-clipped=1.0 2024-08-13 09:14:23,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2092430.0, ans=0.1 2024-08-13 09:14:23,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2092430.0, ans=0.1 2024-08-13 09:14:24,606 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6350, loss[loss=0.08886, beats_loss=0.01121, ecapa_loss=0.0001304, whisper_loss=0.07635, over 15250.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01083, ecapa_loss=0.0001652, whisper_loss=0.0922, over 3860098.73 frames. ], batch size: 60, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:14:35,410 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 09:14:42,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2092530.0, ans=0.2 2024-08-13 09:14:47,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2092530.0, ans=0.0 2024-08-13 09:14:48,876 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 09:14:55,855 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 09:15:10,661 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 09:15:35,537 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6400, loss[loss=0.08545, beats_loss=0.01308, ecapa_loss=0.0001438, whisper_loss=0.07094, over 16086.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001645, whisper_loss=0.09155, over 3859685.79 frames. ], batch size: 66, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:15:36,743 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-13 09:15:40,609 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 09:15:44,577 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 09:15:48,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2093030.0, ans=0.125 2024-08-13 09:16:11,880 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-13 09:16:17,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2093230.0, ans=0.2 2024-08-13 09:16:18,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2093230.0, ans=0.125 2024-08-13 09:16:22,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2093230.0, ans=0.1 2024-08-13 09:16:32,034 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 09:16:34,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.483e+01 2.753e+01 3.245e+01 5.103e+01, threshold=5.505e+01, percent-clipped=0.0 2024-08-13 09:16:37,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2093330.0, ans=0.125 2024-08-13 09:16:41,180 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6450, loss[loss=0.1011, beats_loss=0.01268, ecapa_loss=0.0001427, whisper_loss=0.087, over 22865.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0109, ecapa_loss=0.0001642, whisper_loss=0.09219, over 3889483.10 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:16:48,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.01 vs. limit=22.5 2024-08-13 09:16:50,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2093430.0, ans=0.125 2024-08-13 09:17:09,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2093630.0, ans=0.125 2024-08-13 09:17:11,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.02 vs. limit=22.5 2024-08-13 09:17:36,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2093830.0, ans=0.0 2024-08-13 09:17:46,591 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6500, loss[loss=0.09694, beats_loss=0.0125, ecapa_loss=0.0001302, whisper_loss=0.08314, over 17766.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01089, ecapa_loss=0.0001646, whisper_loss=0.09217, over 3897391.81 frames. ], batch size: 67, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:18:05,200 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 09:18:15,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.50 vs. limit=15.0 2024-08-13 09:18:21,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2094130.0, ans=0.125 2024-08-13 09:18:43,752 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 09:18:46,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.544e+01 2.898e+01 3.309e+01 5.602e+01, threshold=5.795e+01, percent-clipped=1.0 2024-08-13 09:18:50,145 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 32 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 09:18:51,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2094430.0, ans=0.125 2024-08-13 09:18:52,768 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6550, loss[loss=0.08182, beats_loss=0.0145, ecapa_loss=0.0001899, whisper_loss=0.06541, over 21395.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01088, ecapa_loss=0.0001651, whisper_loss=0.09201, over 3909466.68 frames. ], batch size: 93, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:18:56,991 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-13 09:19:04,722 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-13 09:19:13,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2094530.0, ans=0.1 2024-08-13 09:19:25,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2094630.0, ans=0.125 2024-08-13 09:19:41,177 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-13 09:19:41,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2094730.0, ans=0.0 2024-08-13 09:19:49,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2094830.0, ans=0.025 2024-08-13 09:19:54,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2094830.0, ans=0.0 2024-08-13 09:19:55,106 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 09:19:57,590 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6600, loss[loss=0.1267, beats_loss=0.008426, ecapa_loss=0.0001813, whisper_loss=0.1165, over 14085.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01082, ecapa_loss=0.0001664, whisper_loss=0.09204, over 3918374.59 frames. ], batch size: 55, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:19:59,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2094930.0, ans=0.2 2024-08-13 09:20:01,711 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 09:20:03,078 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 09:20:05,563 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 30 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 09:20:12,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-08-13 09:20:17,603 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 31 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 09:20:24,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2095130.0, ans=0.125 2024-08-13 09:20:28,172 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-13 09:20:48,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2095230.0, ans=0.125 2024-08-13 09:20:48,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2095230.0, ans=0.1 2024-08-13 09:20:54,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2095330.0, ans=0.125 2024-08-13 09:20:56,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.422e+01 2.623e+01 3.004e+01 7.541e+01, threshold=5.247e+01, percent-clipped=2.0 2024-08-13 09:20:59,622 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-13 09:21:03,436 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6650, loss[loss=0.1197, beats_loss=0.009268, ecapa_loss=0.0002004, whisper_loss=0.1085, over 19781.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01085, ecapa_loss=0.0001666, whisper_loss=0.09214, over 3926218.30 frames. ], batch size: 79, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:21:06,200 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-13 09:21:06,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2095430.0, ans=0.125 2024-08-13 09:21:13,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2095430.0, ans=0.2 2024-08-13 09:21:21,113 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-13 09:21:22,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=12.0 2024-08-13 09:21:30,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2095630.0, ans=0.125 2024-08-13 09:21:33,889 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 09:21:43,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2095730.0, ans=0.2 2024-08-13 09:21:48,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2095730.0, ans=0.125 2024-08-13 09:21:56,529 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 09:21:58,813 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 20 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-13 09:22:09,186 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6700, loss[loss=0.09786, beats_loss=0.01132, ecapa_loss=0.0001914, whisper_loss=0.08462, over 17274.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01082, ecapa_loss=0.0001675, whisper_loss=0.09259, over 3908415.63 frames. ], batch size: 73, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:22:18,827 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 09:22:25,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2096030.0, ans=0.1 2024-08-13 09:22:26,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2096030.0, ans=0.125 2024-08-13 09:22:41,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2096130.0, ans=0.125 2024-08-13 09:22:49,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2096230.0, ans=0.125 2024-08-13 09:22:55,179 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 09:22:57,753 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 09:22:59,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2096230.0, ans=0.0 2024-08-13 09:23:07,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.450e+01 2.665e+01 3.008e+01 5.668e+01, threshold=5.331e+01, percent-clipped=2.0 2024-08-13 09:23:14,785 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6750, loss[loss=0.09797, beats_loss=0.008974, ecapa_loss=0.0002369, whisper_loss=0.08663, over 20105.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01081, ecapa_loss=0.0001678, whisper_loss=0.092, over 3880626.62 frames. ], batch size: 88, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:23:20,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2096430.0, ans=0.2 2024-08-13 09:23:29,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2096530.0, ans=0.0 2024-08-13 09:23:39,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=22.5 2024-08-13 09:24:01,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2096730.0, ans=0.125 2024-08-13 09:24:02,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2096730.0, ans=0.04949747468305833 2024-08-13 09:24:04,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2024-08-13 09:24:11,514 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-13 09:24:20,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6800, loss[loss=0.1157, beats_loss=0.01022, ecapa_loss=0.0001866, whisper_loss=0.1036, over 23043.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01084, ecapa_loss=0.0001669, whisper_loss=0.0921, over 3882448.71 frames. ], batch size: 91, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:24:22,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2096930.0, ans=0.0 2024-08-13 09:24:25,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2096930.0, ans=0.0 2024-08-13 09:24:38,038 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 09:24:40,702 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 09:25:09,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2097230.0, ans=0.125 2024-08-13 09:25:10,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2097230.0, ans=0.0 2024-08-13 09:25:11,928 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:25:14,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2024-08-13 09:25:18,461 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 09:25:20,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2097330.0, ans=0.125 2024-08-13 09:25:20,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.429e+01 2.619e+01 3.014e+01 5.255e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-13 09:25:27,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6850, loss[loss=0.09608, beats_loss=0.01171, ecapa_loss=0.0001776, whisper_loss=0.0826, over 21462.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001666, whisper_loss=0.09115, over 3835964.94 frames. ], batch size: 89, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:25:28,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2097430.0, ans=0.2 2024-08-13 09:25:33,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2097430.0, ans=0.125 2024-08-13 09:25:39,674 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 09:25:42,457 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.443e-02 2024-08-13 09:25:43,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-13 09:25:48,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2097530.0, ans=0.0 2024-08-13 09:25:52,597 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 09:25:52,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2097630.0, ans=0.125 2024-08-13 09:25:52,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2097630.0, ans=0.125 2024-08-13 09:26:00,365 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 09:26:04,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2097630.0, ans=0.125 2024-08-13 09:26:14,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2097730.0, ans=0.125 2024-08-13 09:26:14,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2097730.0, ans=0.125 2024-08-13 09:26:30,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2097830.0, ans=0.125 2024-08-13 09:26:33,016 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6900, loss[loss=0.09783, beats_loss=0.01037, ecapa_loss=0.0002033, whisper_loss=0.08543, over 15167.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01078, ecapa_loss=0.0001672, whisper_loss=0.09176, over 3838491.33 frames. ], batch size: 63, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:26:56,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2098030.0, ans=0.1 2024-08-13 09:27:00,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2098130.0, ans=0.2 2024-08-13 09:27:12,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2098230.0, ans=0.2 2024-08-13 09:27:18,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2098230.0, ans=0.125 2024-08-13 09:27:28,889 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 09:27:32,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.455e+01 2.903e+01 3.270e+01 5.847e+01, threshold=5.807e+01, percent-clipped=1.0 2024-08-13 09:27:39,162 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 6950, loss[loss=0.1003, beats_loss=0.01008, ecapa_loss=0.0001568, whisper_loss=0.08863, over 15729.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01082, ecapa_loss=0.0001668, whisper_loss=0.09151, over 3828639.47 frames. ], batch size: 64, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:28:04,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2098630.0, ans=0.0 2024-08-13 09:28:15,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2098630.0, ans=0.0 2024-08-13 09:28:22,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2098730.0, ans=0.125 2024-08-13 09:28:25,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2098730.0, ans=0.0 2024-08-13 09:28:31,400 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-13 09:28:39,510 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-13 09:28:41,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2098830.0, ans=0.125 2024-08-13 09:28:42,036 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 09:28:44,325 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7000, loss[loss=0.09834, beats_loss=0.01335, ecapa_loss=0.0001582, whisper_loss=0.0834, over 21409.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01087, ecapa_loss=0.000165, whisper_loss=0.09133, over 3806826.07 frames. ], batch size: 88, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:28:55,306 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 09:29:03,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2099030.0, ans=0.125 2024-08-13 09:29:09,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2099130.0, ans=0.1 2024-08-13 09:29:13,133 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 09:29:20,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2099130.0, ans=0.125 2024-08-13 09:29:24,766 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-13 09:29:29,792 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-13 09:29:37,814 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 24 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-13 09:29:42,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.399e+01 2.678e+01 3.214e+01 5.831e+01, threshold=5.356e+01, percent-clipped=1.0 2024-08-13 09:29:49,622 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7050, loss[loss=0.0814, beats_loss=0.0114, ecapa_loss=0.0001423, whisper_loss=0.06857, over 16432.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01079, ecapa_loss=0.0001652, whisper_loss=0.09183, over 3829721.22 frames. ], batch size: 63, lr: 4.19e-03, grad_scale: 1.152921504606847e+18 2024-08-13 09:29:54,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2099430.0, ans=0.125 2024-08-13 09:30:20,027 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 09:30:28,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2099630.0, ans=0.0 2024-08-13 09:30:34,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2099730.0, ans=0.0 2024-08-13 09:30:38,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2099730.0, ans=0.0 2024-08-13 09:30:40,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2099730.0, ans=0.2 2024-08-13 09:30:45,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2099830.0, ans=0.125 2024-08-13 09:30:56,466 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 33 from Vox, 24 fro AS 2024-08-13 09:30:57,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2099830.0, ans=0.125 2024-08-13 09:31:00,375 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7100, loss[loss=0.08711, beats_loss=0.01502, ecapa_loss=0.0001521, whisper_loss=0.07057, over 21645.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01083, ecapa_loss=0.0001643, whisper_loss=0.09139, over 3836033.06 frames. ], batch size: 92, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:31:02,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2024-08-13 09:31:06,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2099930.0, ans=0.0 2024-08-13 09:31:13,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2099930.0, ans=0.125 2024-08-13 09:31:13,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2099930.0, ans=0.0 2024-08-13 09:31:26,908 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 30 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 09:31:44,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-13 09:32:00,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2024-08-13 09:32:08,804 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.488e+01 2.756e+01 3.074e+01 1.860e+02, threshold=5.512e+01, percent-clipped=2.0 2024-08-13 09:32:08,941 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-13 09:32:11,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2100330.0, ans=0.125 2024-08-13 09:32:14,903 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7150, loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001494, whisper_loss=0.08993, over 20086.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01089, ecapa_loss=0.0001645, whisper_loss=0.09079, over 3825125.64 frames. ], batch size: 78, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:32:24,910 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 09:32:27,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2024-08-13 09:32:30,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2100530.0, ans=0.0 2024-08-13 09:32:44,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2100630.0, ans=0.125 2024-08-13 09:33:03,960 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.240e+01 2024-08-13 09:33:05,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2100730.0, ans=0.025 2024-08-13 09:33:09,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2100730.0, ans=15.0 2024-08-13 09:33:16,300 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 09:33:17,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2100830.0, ans=0.09899494936611666 2024-08-13 09:33:21,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2100830.0, ans=0.125 2024-08-13 09:33:26,943 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 39 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 09:33:29,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7200, loss[loss=0.1039, beats_loss=0.009124, ecapa_loss=0.0001847, whisper_loss=0.09293, over 16482.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001644, whisper_loss=0.09125, over 3827379.37 frames. ], batch size: 68, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:33:31,530 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 09:33:35,653 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 09:33:48,630 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-13 09:33:49,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-13 09:34:03,743 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 09:34:12,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2101230.0, ans=0.125 2024-08-13 09:34:25,044 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 09:34:26,511 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 09:34:38,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.408e+01 2.663e+01 2.960e+01 8.950e+01, threshold=5.327e+01, percent-clipped=1.0 2024-08-13 09:34:44,227 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7250, loss[loss=0.1087, beats_loss=0.01036, ecapa_loss=0.0001692, whisper_loss=0.09666, over 23368.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01082, ecapa_loss=0.0001649, whisper_loss=0.09174, over 3874799.79 frames. ], batch size: 93, lr: 4.19e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:35:07,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2101530.0, ans=0.125 2024-08-13 09:35:26,434 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 09:35:51,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-13 09:35:51,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2101830.0, ans=0.0 2024-08-13 09:35:58,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2101930.0, ans=0.1 2024-08-13 09:35:59,580 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7300, loss[loss=0.101, beats_loss=0.0114, ecapa_loss=0.000173, whisper_loss=0.08784, over 22766.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001646, whisper_loss=0.09185, over 3898981.68 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:35:59,704 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 09:36:01,314 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 09:36:12,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2101930.0, ans=0.1 2024-08-13 09:36:13,644 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-13 09:36:25,449 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 09:36:39,827 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 09:36:46,282 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 09:36:46,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2102230.0, ans=0.125 2024-08-13 09:36:52,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2024-08-13 09:37:08,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.467e+01 2.644e+01 2.965e+01 8.104e+01, threshold=5.287e+01, percent-clipped=3.0 2024-08-13 09:37:13,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.20 vs. limit=10.0 2024-08-13 09:37:14,138 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7350, loss[loss=0.09547, beats_loss=0.01469, ecapa_loss=0.000138, whisper_loss=0.0794, over 18046.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01085, ecapa_loss=0.000165, whisper_loss=0.09186, over 3905160.60 frames. ], batch size: 72, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:37:27,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2102530.0, ans=0.0 2024-08-13 09:38:00,354 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 31 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 09:38:22,388 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 28 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-13 09:38:22,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2102830.0, ans=0.125 2024-08-13 09:38:28,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7400, loss[loss=0.09738, beats_loss=0.01015, ecapa_loss=0.0001653, whisper_loss=0.08558, over 18351.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01087, ecapa_loss=0.0001652, whisper_loss=0.09163, over 3911088.49 frames. ], batch size: 73, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:38:47,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2103030.0, ans=15.0 2024-08-13 09:38:58,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2103130.0, ans=0.0 2024-08-13 09:39:11,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2103130.0, ans=0.2 2024-08-13 09:39:25,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2103230.0, ans=0.0 2024-08-13 09:39:28,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2103230.0, ans=0.07 2024-08-13 09:39:38,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-13 09:39:40,624 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.473e+01 2.699e+01 3.080e+01 4.653e+01, threshold=5.397e+01, percent-clipped=0.0 2024-08-13 09:39:45,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2103330.0, ans=0.1 2024-08-13 09:39:47,334 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7450, loss[loss=0.1213, beats_loss=0.008727, ecapa_loss=0.0001649, whisper_loss=0.1109, over 20079.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01086, ecapa_loss=0.0001657, whisper_loss=0.09184, over 3916904.48 frames. ], batch size: 76, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:39:54,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.85 vs. limit=15.0 2024-08-13 09:40:04,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2103530.0, ans=0.125 2024-08-13 09:40:11,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-13 09:40:14,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2103530.0, ans=0.2 2024-08-13 09:40:16,273 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-13 09:40:24,875 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-13 09:40:27,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2103630.0, ans=0.125 2024-08-13 09:40:27,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2103630.0, ans=0.1 2024-08-13 09:40:53,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-13 09:40:54,637 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 09:41:03,515 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7500, loss[loss=0.08744, beats_loss=0.01445, ecapa_loss=0.00015, whisper_loss=0.07148, over 22286.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01086, ecapa_loss=0.0001654, whisper_loss=0.09153, over 3907467.33 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:41:13,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2103930.0, ans=0.1 2024-08-13 09:41:19,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2024-08-13 09:41:23,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2104030.0, ans=0.0 2024-08-13 09:41:31,781 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-13 09:41:33,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2104130.0, ans=0.04949747468305833 2024-08-13 09:42:07,408 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 09:42:11,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.360e+01 2.624e+01 2.937e+01 1.240e+02, threshold=5.248e+01, percent-clipped=1.0 2024-08-13 09:42:17,266 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7550, loss[loss=0.1003, beats_loss=0.009076, ecapa_loss=0.0002315, whisper_loss=0.08888, over 12961.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01086, ecapa_loss=0.0001666, whisper_loss=0.09155, over 3906911.17 frames. ], batch size: 54, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:42:30,420 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 30 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 09:43:07,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2104730.0, ans=0.05 2024-08-13 09:43:14,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-13 09:43:25,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2024-08-13 09:43:32,203 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7600, loss[loss=0.1098, beats_loss=0.01021, ecapa_loss=0.0001631, whisper_loss=0.09793, over 16587.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01085, ecapa_loss=0.0001665, whisper_loss=0.09146, over 3902759.98 frames. ], batch size: 65, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:43:37,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2104930.0, ans=0.125 2024-08-13 09:43:54,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-08-13 09:44:04,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2105130.0, ans=0.125 2024-08-13 09:44:10,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2105130.0, ans=0.125 2024-08-13 09:44:37,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2024-08-13 09:44:38,228 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 24 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 09:44:41,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.428e+01 2.721e+01 3.053e+01 1.709e+02, threshold=5.443e+01, percent-clipped=2.0 2024-08-13 09:44:46,656 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7650, loss[loss=0.1226, beats_loss=0.006329, ecapa_loss=0.0001893, whisper_loss=0.1144, over 15546.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01076, ecapa_loss=0.0001662, whisper_loss=0.09165, over 3901988.05 frames. ], batch size: 60, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:44:51,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2024-08-13 09:44:57,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2105430.0, ans=0.0 2024-08-13 09:45:08,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2105530.0, ans=10.0 2024-08-13 09:45:16,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2105630.0, ans=0.125 2024-08-13 09:45:26,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2105630.0, ans=0.125 2024-08-13 09:45:27,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2105630.0, ans=0.1 2024-08-13 09:46:02,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7700, loss[loss=0.104, beats_loss=0.01104, ecapa_loss=0.0001735, whisper_loss=0.09124, over 18752.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001647, whisper_loss=0.0912, over 3921159.01 frames. ], batch size: 76, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:46:09,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-08-13 09:46:21,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2106030.0, ans=0.125 2024-08-13 09:46:23,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2106030.0, ans=0.2 2024-08-13 09:46:36,948 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 09:46:44,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2106130.0, ans=0.2 2024-08-13 09:46:44,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2106130.0, ans=0.05 2024-08-13 09:46:58,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2106230.0, ans=0.2 2024-08-13 09:47:00,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.75 vs. limit=5.0 2024-08-13 09:47:07,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-08-13 09:47:11,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.65 vs. limit=10.0 2024-08-13 09:47:12,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.458e+01 2.712e+01 3.112e+01 4.115e+01, threshold=5.423e+01, percent-clipped=0.0 2024-08-13 09:47:12,812 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:47:18,022 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7750, loss[loss=0.08772, beats_loss=0.01229, ecapa_loss=0.0001205, whisper_loss=0.07423, over 19750.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001645, whisper_loss=0.09081, over 3936502.26 frames. ], batch size: 77, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:47:21,510 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 09:47:30,108 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 21 from LS+wenet, 23 from Vox, 14 fro AS 2024-08-13 09:47:33,268 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-13 09:47:43,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-08-13 09:47:55,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2106630.0, ans=0.0 2024-08-13 09:48:02,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2106630.0, ans=0.125 2024-08-13 09:48:19,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2024-08-13 09:48:24,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2106830.0, ans=0.125 2024-08-13 09:48:35,000 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7800, loss[loss=0.08332, beats_loss=0.01, ecapa_loss=0.000134, whisper_loss=0.07198, over 14242.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01083, ecapa_loss=0.000164, whisper_loss=0.09121, over 3928555.07 frames. ], batch size: 54, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:49:00,781 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:49:05,511 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 09:49:16,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2107130.0, ans=0.1 2024-08-13 09:49:45,236 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.478e+01 2.776e+01 3.061e+01 6.531e+01, threshold=5.553e+01, percent-clipped=2.0 2024-08-13 09:49:46,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2107330.0, ans=0.1 2024-08-13 09:49:46,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2107330.0, ans=0.125 2024-08-13 09:49:48,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2107330.0, ans=0.05 2024-08-13 09:49:51,053 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7850, loss[loss=0.1036, beats_loss=0.01315, ecapa_loss=0.0001772, whisper_loss=0.08865, over 21533.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01091, ecapa_loss=0.0001647, whisper_loss=0.09052, over 3894779.00 frames. ], batch size: 92, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:50:00,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2107430.0, ans=0.125 2024-08-13 09:50:01,794 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 09:50:02,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2107430.0, ans=0.5 2024-08-13 09:50:21,821 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 09:50:28,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-08-13 09:50:29,188 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 09:50:29,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2107630.0, ans=0.0 2024-08-13 09:50:32,172 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 20 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-13 09:50:38,155 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 26 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-13 09:50:38,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-13 09:50:40,143 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 09:50:51,257 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 09:50:51,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2107830.0, ans=0.125 2024-08-13 09:50:57,663 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-13 09:50:59,582 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-13 09:51:00,749 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 13 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-13 09:51:00,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2107830.0, ans=0.035 2024-08-13 09:51:01,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2024-08-13 09:51:08,214 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7900, loss[loss=0.1052, beats_loss=0.01332, ecapa_loss=0.0001494, whisper_loss=0.09038, over 23144.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01089, ecapa_loss=0.0001648, whisper_loss=0.09124, over 3909547.75 frames. ], batch size: 93, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:51:19,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2107930.0, ans=0.125 2024-08-13 09:51:26,008 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-13 09:51:36,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-08-13 09:51:40,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2108130.0, ans=0.0 2024-08-13 09:51:49,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2108130.0, ans=0.0 2024-08-13 09:51:52,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2108130.0, ans=0.0 2024-08-13 09:52:00,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2108230.0, ans=0.125 2024-08-13 09:52:02,719 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 09:52:07,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2108230.0, ans=0.0 2024-08-13 09:52:19,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.346e+01 2.630e+01 3.151e+01 7.356e+01, threshold=5.260e+01, percent-clipped=1.0 2024-08-13 09:52:22,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2108330.0, ans=0.2 2024-08-13 09:52:26,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 7950, loss[loss=0.1078, beats_loss=0.01043, ecapa_loss=0.0001613, whisper_loss=0.09574, over 22276.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01095, ecapa_loss=0.0001645, whisper_loss=0.09091, over 3878194.67 frames. ], batch size: 89, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:52:30,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-13 09:53:08,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2108630.0, ans=0.0 2024-08-13 09:53:26,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2108730.0, ans=0.95 2024-08-13 09:53:36,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2108830.0, ans=0.1 2024-08-13 09:53:45,242 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8000, loss[loss=0.09774, beats_loss=0.0113, ecapa_loss=0.0001455, whisper_loss=0.08498, over 22777.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01084, ecapa_loss=0.0001651, whisper_loss=0.09171, over 3866501.55 frames. ], batch size: 91, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:53:56,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2108930.0, ans=0.07 2024-08-13 09:54:11,801 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 09:54:26,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2109130.0, ans=0.2 2024-08-13 09:54:43,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2109230.0, ans=0.125 2024-08-13 09:54:45,909 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-13 09:54:46,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2109330.0, ans=0.125 2024-08-13 09:54:46,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2109330.0, ans=0.0 2024-08-13 09:54:56,453 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.293e+01 2.578e+01 2.886e+01 4.471e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-13 09:55:00,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2109330.0, ans=0.125 2024-08-13 09:55:02,805 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8050, loss[loss=0.101, beats_loss=0.01064, ecapa_loss=0.000164, whisper_loss=0.0887, over 22780.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001643, whisper_loss=0.09184, over 3886113.29 frames. ], batch size: 90, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:55:27,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2109530.0, ans=0.125 2024-08-13 09:55:30,465 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-13 09:55:35,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-08-13 09:55:36,857 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 16 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-13 09:55:39,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2109630.0, ans=0.1 2024-08-13 09:55:46,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2109630.0, ans=0.125 2024-08-13 09:55:52,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-13 09:56:06,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2109830.0, ans=0.1 2024-08-13 09:56:17,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2109830.0, ans=0.125 2024-08-13 09:56:20,563 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8100, loss[loss=0.07091, beats_loss=0.01293, ecapa_loss=0.000175, whisper_loss=0.05623, over 14545.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01087, ecapa_loss=0.0001635, whisper_loss=0.09074, over 3915799.43 frames. ], batch size: 59, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:56:35,052 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-13 09:56:45,556 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 09:56:52,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110130.0, ans=0.1 2024-08-13 09:56:53,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2110130.0, ans=0.0 2024-08-13 09:57:00,931 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-13 09:57:03,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2110130.0, ans=0.0 2024-08-13 09:57:24,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=2110330.0, ans=6.0 2024-08-13 09:57:24,833 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-13 09:57:26,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110330.0, ans=0.1 2024-08-13 09:57:29,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2110330.0, ans=0.0 2024-08-13 09:57:30,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.445e+01 2.691e+01 3.022e+01 6.409e+01, threshold=5.382e+01, percent-clipped=1.0 2024-08-13 09:57:36,923 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8150, loss[loss=0.1146, beats_loss=0.01098, ecapa_loss=0.0001682, whisper_loss=0.1019, over 23285.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0108, ecapa_loss=0.0001645, whisper_loss=0.09053, over 3896823.35 frames. ], batch size: 90, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:57:51,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2110530.0, ans=0.125 2024-08-13 09:57:55,384 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 17 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-13 09:58:01,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2110530.0, ans=0.1 2024-08-13 09:58:01,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2110530.0, ans=0.2 2024-08-13 09:58:26,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2110730.0, ans=0.125 2024-08-13 09:58:42,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2110830.0, ans=0.0 2024-08-13 09:58:46,367 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-13 09:58:54,372 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8200, loss[loss=0.1069, beats_loss=0.01114, ecapa_loss=0.0001278, whisper_loss=0.09449, over 21383.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01076, ecapa_loss=0.0001648, whisper_loss=0.09089, over 3905286.55 frames. ], batch size: 83, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 09:59:01,511 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 09:59:01,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2110930.0, ans=0.125 2024-08-13 09:59:06,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2110930.0, ans=0.125 2024-08-13 09:59:26,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.39 vs. limit=10.0 2024-08-13 09:59:39,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2111230.0, ans=0.125 2024-08-13 09:59:44,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2111230.0, ans=0.125 2024-08-13 09:59:48,455 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 09:59:56,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2111330.0, ans=0.0 2024-08-13 10:00:08,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.520e+01 2.689e+01 2.972e+01 4.311e+01, threshold=5.378e+01, percent-clipped=0.0 2024-08-13 10:00:14,739 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8250, loss[loss=0.1092, beats_loss=0.005664, ecapa_loss=0.0002169, whisper_loss=0.1014, over 13394.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01079, ecapa_loss=0.0001649, whisper_loss=0.09049, over 3909918.79 frames. ], batch size: 54, lr: 4.18e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:00:47,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2111630.0, ans=0.125 2024-08-13 10:01:01,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2111730.0, ans=0.1 2024-08-13 10:01:01,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2111730.0, ans=0.125 2024-08-13 10:01:01,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2111730.0, ans=0.0 2024-08-13 10:01:21,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=2111830.0, ans=0.2 2024-08-13 10:01:23,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2111830.0, ans=0.125 2024-08-13 10:01:27,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2111830.0, ans=0.125 2024-08-13 10:01:29,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2024-08-13 10:01:35,588 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8300, loss[loss=0.1116, beats_loss=0.01266, ecapa_loss=0.0001538, whisper_loss=0.0974, over 22647.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01082, ecapa_loss=0.0001641, whisper_loss=0.09029, over 3861587.04 frames. ], batch size: 91, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:01:38,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2111930.0, ans=0.125 2024-08-13 10:01:49,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2112030.0, ans=0.125 2024-08-13 10:01:52,318 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 28 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 10:01:56,757 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 10:02:04,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2112130.0, ans=0.125 2024-08-13 10:02:16,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2024-08-13 10:02:26,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2112230.0, ans=0.125 2024-08-13 10:02:30,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2024-08-13 10:02:39,207 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-13 10:02:46,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.390e+01 2.767e+01 3.084e+01 3.775e+01, threshold=5.535e+01, percent-clipped=0.0 2024-08-13 10:02:52,859 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8350, loss[loss=0.08835, beats_loss=0.01097, ecapa_loss=0.0001714, whisper_loss=0.07566, over 16845.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01083, ecapa_loss=0.0001649, whisper_loss=0.09089, over 3872010.38 frames. ], batch size: 68, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:03:02,983 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-13 10:03:27,706 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 30 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-13 10:03:37,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2112730.0, ans=0.125 2024-08-13 10:03:40,706 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 26 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-13 10:03:45,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2112730.0, ans=0.2 2024-08-13 10:04:10,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8400, loss[loss=0.1165, beats_loss=0.009126, ecapa_loss=0.0001614, whisper_loss=0.1058, over 16698.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01075, ecapa_loss=0.0001659, whisper_loss=0.09166, over 3885835.64 frames. ], batch size: 66, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:04:19,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2112930.0, ans=0.125 2024-08-13 10:04:30,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2113030.0, ans=0.0 2024-08-13 10:04:31,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2024-08-13 10:04:49,253 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-13 10:04:54,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2113130.0, ans=0.125 2024-08-13 10:04:57,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2113230.0, ans=0.95 2024-08-13 10:04:57,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2113230.0, ans=0.125 2024-08-13 10:04:58,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.86 vs. limit=10.0 2024-08-13 10:05:00,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.65 vs. limit=15.0 2024-08-13 10:05:05,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2113230.0, ans=0.125 2024-08-13 10:05:06,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-08-13 10:05:22,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.471e+01 2.703e+01 3.041e+01 5.042e+01, threshold=5.407e+01, percent-clipped=0.0 2024-08-13 10:05:28,252 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8450, loss[loss=0.09065, beats_loss=0.01219, ecapa_loss=0.0001305, whisper_loss=0.07715, over 16415.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01074, ecapa_loss=0.0001645, whisper_loss=0.09212, over 3898756.53 frames. ], batch size: 62, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:05:28,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2113430.0, ans=0.1 2024-08-13 10:05:38,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=12.0 2024-08-13 10:06:20,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2113730.0, ans=0.125 2024-08-13 10:06:24,506 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 10:06:37,253 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 10:06:48,884 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8500, loss[loss=0.07582, beats_loss=0.01188, ecapa_loss=0.0001655, whisper_loss=0.06229, over 16143.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01071, ecapa_loss=0.0001649, whisper_loss=0.09213, over 3904301.83 frames. ], batch size: 65, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:07:03,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2113930.0, ans=0.125 2024-08-13 10:07:13,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2114030.0, ans=0.2 2024-08-13 10:07:15,950 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 10:07:27,031 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 10:07:30,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2024-08-13 10:07:45,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2114230.0, ans=0.125 2024-08-13 10:07:47,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2114230.0, ans=0.125 2024-08-13 10:07:57,930 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 27 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 10:08:01,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2114330.0, ans=0.2 2024-08-13 10:08:04,263 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.378e+01 2.649e+01 2.972e+01 5.253e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-13 10:08:10,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8550, loss[loss=0.08566, beats_loss=0.01179, ecapa_loss=0.00017, whisper_loss=0.07217, over 14767.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01072, ecapa_loss=0.0001647, whisper_loss=0.09207, over 3896543.33 frames. ], batch size: 63, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:08:32,391 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 34 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-13 10:08:35,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2114530.0, ans=0.125 2024-08-13 10:08:37,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2114530.0, ans=0.0 2024-08-13 10:08:40,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2114630.0, ans=0.125 2024-08-13 10:08:46,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2114630.0, ans=0.125 2024-08-13 10:08:54,787 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 10:08:58,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-13 10:09:03,665 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-13 10:09:14,821 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 10:09:30,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-08-13 10:09:31,215 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8600, loss[loss=0.115, beats_loss=0.01098, ecapa_loss=0.0001478, whisper_loss=0.1026, over 19548.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01079, ecapa_loss=0.0001646, whisper_loss=0.09174, over 3888363.12 frames. ], batch size: 74, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:09:36,465 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 26 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-13 10:09:54,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2115030.0, ans=0.07 2024-08-13 10:10:08,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2024-08-13 10:10:10,519 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 16 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 10:10:12,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2024-08-13 10:10:17,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2115130.0, ans=0.125 2024-08-13 10:10:19,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2115230.0, ans=0.125 2024-08-13 10:10:25,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2115230.0, ans=0.2 2024-08-13 10:10:26,494 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 10:10:27,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-08-13 10:10:44,048 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 10:10:45,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.403e+01 2.760e+01 3.057e+01 6.734e+01, threshold=5.520e+01, percent-clipped=3.0 2024-08-13 10:10:51,404 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8650, loss[loss=0.1223, beats_loss=0.009657, ecapa_loss=0.0001445, whisper_loss=0.1112, over 21331.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001653, whisper_loss=0.09178, over 3865044.39 frames. ], batch size: 82, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:10:51,520 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 10:10:53,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2115430.0, ans=0.125 2024-08-13 10:11:17,192 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 10:11:26,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2115630.0, ans=0.0 2024-08-13 10:11:28,880 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-13 10:11:38,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-13 10:11:39,470 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 10:11:42,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2115730.0, ans=0.125 2024-08-13 10:11:43,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2115730.0, ans=0.0 2024-08-13 10:12:08,314 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8700, loss[loss=0.1142, beats_loss=0.01143, ecapa_loss=0.0001496, whisper_loss=0.1013, over 23853.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01075, ecapa_loss=0.0001654, whisper_loss=0.09205, over 3850214.64 frames. ], batch size: 93, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:12:20,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-13 10:12:20,816 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 10:12:22,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2115930.0, ans=0.125 2024-08-13 10:12:28,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2116030.0, ans=0.125 2024-08-13 10:12:36,008 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 24 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-13 10:12:40,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2116130.0, ans=0.125 2024-08-13 10:12:40,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2116130.0, ans=0.125 2024-08-13 10:12:56,868 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 10:13:12,303 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-13 10:13:15,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2116330.0, ans=0.125 2024-08-13 10:13:15,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2116330.0, ans=0.125 2024-08-13 10:13:18,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2116330.0, ans=0.125 2024-08-13 10:13:24,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.443e+01 2.656e+01 3.130e+01 5.733e+01, threshold=5.311e+01, percent-clipped=2.0 2024-08-13 10:13:30,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8750, loss[loss=0.1022, beats_loss=0.008682, ecapa_loss=0.0001704, whisper_loss=0.09181, over 14553.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01075, ecapa_loss=0.0001643, whisper_loss=0.09216, over 3859056.96 frames. ], batch size: 57, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:13:39,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2116430.0, ans=0.125 2024-08-13 10:13:56,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2116530.0, ans=0.125 2024-08-13 10:14:08,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=15.0 2024-08-13 10:14:10,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2116630.0, ans=0.0 2024-08-13 10:14:13,600 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-13 10:14:20,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.91 vs. limit=10.0 2024-08-13 10:14:23,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2116730.0, ans=0.125 2024-08-13 10:14:29,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2116730.0, ans=0.0 2024-08-13 10:14:36,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2116830.0, ans=0.04949747468305833 2024-08-13 10:14:40,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2116830.0, ans=0.2 2024-08-13 10:14:45,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2116830.0, ans=0.0 2024-08-13 10:14:49,854 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8800, loss[loss=0.1215, beats_loss=0.008128, ecapa_loss=0.0001725, whisper_loss=0.1117, over 14390.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01086, ecapa_loss=0.0001637, whisper_loss=0.09233, over 3874262.16 frames. ], batch size: 54, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:14:52,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2116930.0, ans=0.125 2024-08-13 10:15:13,060 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 10:15:14,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2117030.0, ans=0.1 2024-08-13 10:15:19,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=12.0 2024-08-13 10:15:20,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2024-08-13 10:15:22,777 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.563e-01 2024-08-13 10:15:35,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2117130.0, ans=0.2 2024-08-13 10:15:39,955 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 10:15:56,550 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 10:16:01,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2117330.0, ans=0.125 2024-08-13 10:16:04,703 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 10:16:06,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.410e+01 2.636e+01 2.976e+01 1.522e+02, threshold=5.272e+01, percent-clipped=1.0 2024-08-13 10:16:08,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2117330.0, ans=10.0 2024-08-13 10:16:13,299 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8850, loss[loss=0.09891, beats_loss=0.01111, ecapa_loss=0.0001515, whisper_loss=0.08629, over 19395.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.0001636, whisper_loss=0.09175, over 3872380.39 frames. ], batch size: 78, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:16:21,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2117430.0, ans=0.0 2024-08-13 10:16:36,258 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 10:16:39,610 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 10:16:48,482 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 10:17:02,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2117730.0, ans=0.125 2024-08-13 10:17:10,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-08-13 10:17:25,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-08-13 10:17:34,219 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8900, loss[loss=0.09795, beats_loss=0.0119, ecapa_loss=0.0001183, whisper_loss=0.08487, over 23953.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01097, ecapa_loss=0.0001636, whisper_loss=0.09106, over 3862238.26 frames. ], batch size: 92, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:17:38,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2117930.0, ans=0.125 2024-08-13 10:17:40,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2117930.0, ans=0.05 2024-08-13 10:18:01,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2118030.0, ans=0.2 2024-08-13 10:18:05,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2118130.0, ans=0.0 2024-08-13 10:18:10,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2118130.0, ans=0.2 2024-08-13 10:18:22,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2118230.0, ans=0.125 2024-08-13 10:18:41,915 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 10:18:43,408 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 19 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-13 10:18:48,336 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.342e+01 2.664e+01 2.910e+01 6.216e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-13 10:18:49,773 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 23 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-13 10:18:54,507 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 8950, loss[loss=0.1157, beats_loss=0.00902, ecapa_loss=0.0002048, whisper_loss=0.1046, over 19930.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001658, whisper_loss=0.09148, over 3880412.90 frames. ], batch size: 82, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:19:08,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2118430.0, ans=0.2 2024-08-13 10:19:12,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2118530.0, ans=0.125 2024-08-13 10:19:27,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=12.0 2024-08-13 10:19:35,496 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-13 10:19:38,661 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-13 10:19:47,192 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-13 10:19:56,719 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-13 10:20:13,256 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9000, loss[loss=0.1139, beats_loss=0.01044, ecapa_loss=0.0001377, whisper_loss=0.1021, over 19468.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01089, ecapa_loss=0.0001662, whisper_loss=0.09138, over 3846376.31 frames. ], batch size: 74, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:20:13,257 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 10:20:54,940 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005617, whisper_loss=0.2479, over 922467.00 frames. 2024-08-13 10:21:13,635 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on SV_voxceleb1: loss=0.004578, beats_loss=0, ecapa_loss=0.0004578, whisper_loss=0, over 939242.00 frames. 2024-08-13 10:23:02,628 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on AT_audioset: loss=0.02381, beats_loss=0.02381, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 10:23:02,632 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 10:23:17,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-08-13 10:23:22,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2024-08-13 10:23:25,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2024-08-13 10:23:42,049 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 10:24:01,371 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 31 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-13 10:24:03,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2119230.0, ans=0.125 2024-08-13 10:24:15,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2024-08-13 10:24:18,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.388e+01 2.773e+01 3.157e+01 5.459e+01, threshold=5.546e+01, percent-clipped=1.0 2024-08-13 10:24:24,654 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9050, loss[loss=0.08379, beats_loss=0.01153, ecapa_loss=0.0001943, whisper_loss=0.07032, over 17644.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01083, ecapa_loss=0.0001663, whisper_loss=0.09197, over 3875584.17 frames. ], batch size: 73, lr: 4.17e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:24:32,357 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-13 10:24:32,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2119430.0, ans=0.0 2024-08-13 10:24:32,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2119430.0, ans=0.0 2024-08-13 10:24:49,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2119530.0, ans=0.125 2024-08-13 10:24:59,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2119630.0, ans=0.125 2024-08-13 10:25:01,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2119630.0, ans=0.125 2024-08-13 10:25:02,577 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 10:25:02,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2119630.0, ans=0.125 2024-08-13 10:25:41,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2119830.0, ans=0.125 2024-08-13 10:25:44,360 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9100, loss[loss=0.09977, beats_loss=0.01185, ecapa_loss=0.0001164, whisper_loss=0.08676, over 18991.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01081, ecapa_loss=0.0001654, whisper_loss=0.09199, over 3879417.56 frames. ], batch size: 74, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:25:54,969 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-212000.pt 2024-08-13 10:25:58,457 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:26:03,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.31 vs. limit=15.0 2024-08-13 10:26:13,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2120030.0, ans=0.0 2024-08-13 10:26:31,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2120130.0, ans=0.125 2024-08-13 10:26:34,644 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-13 10:26:41,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2120230.0, ans=0.125 2024-08-13 10:26:51,138 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 10:27:02,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.361e+01 2.637e+01 2.940e+01 4.647e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-13 10:27:10,225 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9150, loss[loss=0.1287, beats_loss=0.008711, ecapa_loss=0.0001708, whisper_loss=0.1183, over 23652.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.0108, ecapa_loss=0.000166, whisper_loss=0.09227, over 3905353.54 frames. ], batch size: 87, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:27:10,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2120430.0, ans=0.125 2024-08-13 10:27:14,812 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 17 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-13 10:27:30,545 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 10:27:43,991 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 10:27:44,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2120630.0, ans=0.0 2024-08-13 10:27:58,702 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 10:28:05,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2024-08-13 10:28:18,880 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 10:28:19,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2120830.0, ans=22.5 2024-08-13 10:28:29,877 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9200, loss[loss=0.09319, beats_loss=0.01124, ecapa_loss=0.0001658, whisper_loss=0.08029, over 21469.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01076, ecapa_loss=0.0001658, whisper_loss=0.09244, over 3906749.49 frames. ], batch size: 90, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:28:41,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2120930.0, ans=0.125 2024-08-13 10:28:42,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2120930.0, ans=0.125 2024-08-13 10:29:20,684 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 26 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-13 10:29:41,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.412e+01 2.586e+01 2.944e+01 1.076e+02, threshold=5.171e+01, percent-clipped=1.0 2024-08-13 10:29:42,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2121330.0, ans=0.125 2024-08-13 10:29:44,117 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 14 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 10:29:48,977 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9250, loss[loss=0.107, beats_loss=0.01008, ecapa_loss=0.0001861, whisper_loss=0.09507, over 21967.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001659, whisper_loss=0.09118, over 3891958.97 frames. ], batch size: 90, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:29:49,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2121430.0, ans=0.2 2024-08-13 10:30:16,248 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 10:30:24,024 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 10:30:24,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2121630.0, ans=0.1 2024-08-13 10:30:28,881 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 10:30:42,373 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-13 10:30:50,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2121730.0, ans=0.015 2024-08-13 10:30:50,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2121730.0, ans=0.125 2024-08-13 10:30:59,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.23 vs. limit=10.0 2024-08-13 10:31:12,465 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-13 10:31:13,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9300, loss[loss=0.09434, beats_loss=0.009704, ecapa_loss=0.0001917, whisper_loss=0.08272, over 19268.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01078, ecapa_loss=0.0001672, whisper_loss=0.09133, over 3889067.92 frames. ], batch size: 76, lr: 4.17e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:31:29,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-08-13 10:31:31,731 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 10:31:43,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2122030.0, ans=0.125 2024-08-13 10:31:52,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2122130.0, ans=0.125 2024-08-13 10:32:08,648 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-13 10:32:13,648 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 10:32:23,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2122330.0, ans=0.0 2024-08-13 10:32:26,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2122330.0, ans=0.125 2024-08-13 10:32:27,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.057e+01 2.387e+01 2.545e+01 2.935e+01 6.659e+01, threshold=5.090e+01, percent-clipped=1.0 2024-08-13 10:32:34,604 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9350, loss[loss=0.1129, beats_loss=0.009807, ecapa_loss=0.0001521, whisper_loss=0.1015, over 19750.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01078, ecapa_loss=0.0001675, whisper_loss=0.09149, over 3870641.73 frames. ], batch size: 75, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:32:43,157 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-13 10:32:57,717 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 10:33:03,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2122530.0, ans=0.1 2024-08-13 10:33:10,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2024-08-13 10:33:27,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2024-08-13 10:33:32,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2122730.0, ans=0.0 2024-08-13 10:33:38,889 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-13 10:33:55,885 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9400, loss[loss=0.1005, beats_loss=0.01023, ecapa_loss=0.0001722, whisper_loss=0.08858, over 19718.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01077, ecapa_loss=0.0001672, whisper_loss=0.09172, over 3888214.41 frames. ], batch size: 78, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:34:08,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2122930.0, ans=0.125 2024-08-13 10:34:21,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2123030.0, ans=0.1 2024-08-13 10:34:33,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2024-08-13 10:34:38,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2123130.0, ans=0.0 2024-08-13 10:34:42,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2123130.0, ans=0.1 2024-08-13 10:34:55,444 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:34:57,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2123230.0, ans=0.125 2024-08-13 10:34:58,729 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 10:35:05,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.51 vs. limit=15.0 2024-08-13 10:35:11,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.356e+01 2.664e+01 2.978e+01 5.324e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-13 10:35:16,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.36 vs. limit=22.5 2024-08-13 10:35:17,153 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9450, loss[loss=0.1088, beats_loss=0.009086, ecapa_loss=0.000213, whisper_loss=0.09763, over 19960.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001669, whisper_loss=0.09093, over 3888570.42 frames. ], batch size: 85, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:35:22,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2123430.0, ans=0.125 2024-08-13 10:35:33,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2123530.0, ans=0.125 2024-08-13 10:35:38,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2123530.0, ans=0.05 2024-08-13 10:35:52,090 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 10:36:02,054 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 21 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-13 10:36:08,170 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 19 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-13 10:36:25,619 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 10:36:42,311 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9500, loss[loss=0.09955, beats_loss=0.01073, ecapa_loss=0.0001829, whisper_loss=0.08699, over 22852.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01081, ecapa_loss=0.0001674, whisper_loss=0.09124, over 3899708.00 frames. ], batch size: 93, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:36:52,309 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 16 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-13 10:37:23,009 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 10:37:33,232 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-13 10:37:45,949 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-13 10:37:48,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2124230.0, ans=0.125 2024-08-13 10:37:48,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2124230.0, ans=0.1 2024-08-13 10:38:06,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-13 10:38:28,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+01 2.383e+01 2.725e+01 3.152e+01 1.098e+02, threshold=5.450e+01, percent-clipped=1.0 2024-08-13 10:38:38,679 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9550, loss[loss=0.0909, beats_loss=0.01191, ecapa_loss=0.000168, whisper_loss=0.07731, over 21720.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01079, ecapa_loss=0.000168, whisper_loss=0.09111, over 3876535.39 frames. ], batch size: 90, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:38:39,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2124430.0, ans=0.125 2024-08-13 10:38:50,899 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 10:39:02,773 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 19 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-13 10:39:11,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2124530.0, ans=0.1 2024-08-13 10:39:12,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2124530.0, ans=0.0 2024-08-13 10:39:18,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2124530.0, ans=0.125 2024-08-13 10:39:21,518 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 22 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-13 10:39:22,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2124530.0, ans=10.0 2024-08-13 10:39:35,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2124630.0, ans=0.09899494936611666 2024-08-13 10:39:38,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2124630.0, ans=0.125 2024-08-13 10:39:53,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2124730.0, ans=0.1 2024-08-13 10:39:58,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2124730.0, ans=0.125 2024-08-13 10:40:03,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2124730.0, ans=0.0 2024-08-13 10:40:13,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2124830.0, ans=0.125 2024-08-13 10:40:27,932 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9600, loss[loss=0.1047, beats_loss=0.01019, ecapa_loss=0.0001898, whisper_loss=0.09261, over 22412.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01086, ecapa_loss=0.0001659, whisper_loss=0.09031, over 3868997.45 frames. ], batch size: 88, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:40:30,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2124930.0, ans=0.025 2024-08-13 10:40:32,662 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 17 from LS+wenet, 31 from Vox, 20 fro AS 2024-08-13 10:40:38,380 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-13 10:41:14,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2125130.0, ans=0.1 2024-08-13 10:41:25,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2125230.0, ans=0.125 2024-08-13 10:41:37,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2125330.0, ans=0.1 2024-08-13 10:41:40,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2125330.0, ans=0.015 2024-08-13 10:41:47,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.442e+01 2.705e+01 2.957e+01 4.182e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-13 10:41:49,855 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:41:55,725 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9650, loss[loss=0.1065, beats_loss=0.01189, ecapa_loss=0.0001379, whisper_loss=0.09327, over 21006.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01079, ecapa_loss=0.0001653, whisper_loss=0.09058, over 3821771.48 frames. ], batch size: 82, lr: 4.16e-03, grad_scale: 1.152921504606847e+18 2024-08-13 10:42:07,586 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 10:42:39,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2024-08-13 10:42:46,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-13 10:42:47,617 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-13 10:42:59,844 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 30 from Vox, 21 fro AS 2024-08-13 10:43:27,820 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9700, loss[loss=0.1105, beats_loss=0.008547, ecapa_loss=0.0001852, whisper_loss=0.1001, over 21524.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01086, ecapa_loss=0.0001656, whisper_loss=0.09042, over 3849007.44 frames. ], batch size: 87, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:43:29,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2125930.0, ans=0.015 2024-08-13 10:43:34,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2125930.0, ans=0.0 2024-08-13 10:43:40,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2125930.0, ans=0.1 2024-08-13 10:44:31,191 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 10:44:35,705 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 10:44:39,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2126230.0, ans=0.07 2024-08-13 10:45:00,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=12.0 2024-08-13 10:45:03,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2126330.0, ans=0.0 2024-08-13 10:45:09,636 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.435e+01 2.595e+01 3.006e+01 3.939e+01, threshold=5.189e+01, percent-clipped=0.0 2024-08-13 10:45:09,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2126330.0, ans=0.125 2024-08-13 10:45:15,834 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:45:16,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9750, loss[loss=0.1041, beats_loss=0.01239, ecapa_loss=0.0001304, whisper_loss=0.09035, over 20840.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01086, ecapa_loss=0.000165, whisper_loss=0.09053, over 3830405.49 frames. ], batch size: 81, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:45:22,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2024-08-13 10:45:40,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2126530.0, ans=0.125 2024-08-13 10:45:52,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2126530.0, ans=0.0 2024-08-13 10:46:04,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2126630.0, ans=0.0 2024-08-13 10:46:13,837 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 10:46:53,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=12.0 2024-08-13 10:46:54,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2126830.0, ans=0.1 2024-08-13 10:46:56,009 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-13 10:47:07,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2126830.0, ans=0.2 2024-08-13 10:47:12,456 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9800, loss[loss=0.09917, beats_loss=0.01326, ecapa_loss=0.0001194, whisper_loss=0.08472, over 23752.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01091, ecapa_loss=0.0001654, whisper_loss=0.09023, over 3826480.00 frames. ], batch size: 94, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:47:20,025 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 10:47:25,153 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 15 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-13 10:47:32,284 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 10:47:34,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2127030.0, ans=0.1 2024-08-13 10:47:41,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2127030.0, ans=0.025 2024-08-13 10:47:53,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2127030.0, ans=0.0 2024-08-13 10:48:12,148 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-13 10:48:37,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2127230.0, ans=0.125 2024-08-13 10:48:51,668 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 10:48:52,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.51 vs. limit=22.5 2024-08-13 10:49:04,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.353e+01 2.628e+01 3.072e+01 7.221e+01, threshold=5.255e+01, percent-clipped=1.0 2024-08-13 10:49:12,321 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9850, loss[loss=0.1194, beats_loss=0.01013, ecapa_loss=0.0001622, whisper_loss=0.1076, over 17826.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01095, ecapa_loss=0.0001652, whisper_loss=0.09021, over 3839775.20 frames. ], batch size: 70, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:49:55,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2127530.0, ans=0.0 2024-08-13 10:50:00,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2127630.0, ans=0.125 2024-08-13 10:50:03,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2127630.0, ans=0.0 2024-08-13 10:50:09,946 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 10:50:21,183 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 10:50:29,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2127730.0, ans=0.0 2024-08-13 10:50:32,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.23 vs. limit=10.0 2024-08-13 10:50:52,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2127830.0, ans=0.0 2024-08-13 10:50:54,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2127830.0, ans=0.125 2024-08-13 10:50:58,829 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 32 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 10:51:00,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2127830.0, ans=0.0 2024-08-13 10:51:05,639 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9900, loss[loss=0.1225, beats_loss=0.008644, ecapa_loss=0.0001802, whisper_loss=0.112, over 20552.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01095, ecapa_loss=0.0001654, whisper_loss=0.09053, over 3843694.43 frames. ], batch size: 81, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:51:23,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2128030.0, ans=0.0 2024-08-13 10:52:18,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.402e+01 2.725e+01 3.042e+01 4.728e+01, threshold=5.451e+01, percent-clipped=0.0 2024-08-13 10:52:23,179 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 9950, loss[loss=0.08567, beats_loss=0.009667, ecapa_loss=0.0001833, whisper_loss=0.07417, over 12884.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0109, ecapa_loss=0.0001659, whisper_loss=0.09099, over 3871231.13 frames. ], batch size: 53, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:52:27,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2128430.0, ans=0.0 2024-08-13 10:52:37,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-13 10:52:37,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2128430.0, ans=0.125 2024-08-13 10:52:57,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2128630.0, ans=0.035 2024-08-13 10:53:31,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2024-08-13 10:53:42,744 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10000, loss[loss=0.08988, beats_loss=0.01275, ecapa_loss=0.0001779, whisper_loss=0.07536, over 19317.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01089, ecapa_loss=0.0001666, whisper_loss=0.091, over 3855481.56 frames. ], batch size: 81, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:53:45,692 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-13 10:53:45,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2128930.0, ans=0.125 2024-08-13 10:53:47,246 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 10:53:54,111 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-13 10:53:55,358 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 10:54:07,634 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 25 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-13 10:54:12,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2129130.0, ans=0.0 2024-08-13 10:54:17,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2129130.0, ans=0.125 2024-08-13 10:54:34,839 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-13 10:54:45,792 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-13 10:54:49,892 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-13 10:54:57,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.402e+01 2.704e+01 2.977e+01 9.053e+01, threshold=5.409e+01, percent-clipped=1.0 2024-08-13 10:55:00,782 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 10:55:02,773 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10050, loss[loss=0.1146, beats_loss=0.01042, ecapa_loss=0.000127, whisper_loss=0.1029, over 21783.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001665, whisper_loss=0.09098, over 3831828.20 frames. ], batch size: 83, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:55:04,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-13 10:55:06,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-08-13 10:55:10,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2129430.0, ans=0.125 2024-08-13 10:55:28,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2129530.0, ans=0.0 2024-08-13 10:55:40,211 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-13 10:55:44,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2129630.0, ans=0.125 2024-08-13 10:55:49,948 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-13 10:55:59,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2129730.0, ans=22.5 2024-08-13 10:56:11,006 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-13 10:56:25,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10100, loss[loss=0.1129, beats_loss=0.01037, ecapa_loss=0.0001617, whisper_loss=0.1009, over 13556.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01085, ecapa_loss=0.0001669, whisper_loss=0.09075, over 3857778.96 frames. ], batch size: 54, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:56:36,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2129930.0, ans=0.1 2024-08-13 10:56:36,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-13 10:56:49,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2130030.0, ans=0.1 2024-08-13 10:57:10,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2130130.0, ans=0.0 2024-08-13 10:57:10,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-13 10:57:14,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2130230.0, ans=0.2 2024-08-13 10:57:26,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2130230.0, ans=0.0 2024-08-13 10:57:34,261 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 28 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-13 10:57:37,715 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 21 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-13 10:57:39,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2130330.0, ans=0.2 2024-08-13 10:57:42,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.392e+01 2.656e+01 2.956e+01 4.246e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-13 10:57:46,733 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10150, loss[loss=0.08806, beats_loss=0.01147, ecapa_loss=0.0002111, whisper_loss=0.07448, over 19195.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01082, ecapa_loss=0.0001663, whisper_loss=0.09108, over 3858220.43 frames. ], batch size: 83, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:57:49,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2130430.0, ans=0.125 2024-08-13 10:57:49,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2130430.0, ans=0.125 2024-08-13 10:58:00,985 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-13 10:58:03,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2130530.0, ans=0.125 2024-08-13 10:58:12,620 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-13 10:58:22,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.01 vs. limit=10.0 2024-08-13 10:58:24,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2024-08-13 10:58:36,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2130730.0, ans=0.125 2024-08-13 10:59:06,767 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10200, loss[loss=0.09925, beats_loss=0.009481, ecapa_loss=0.0002244, whisper_loss=0.08752, over 21488.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0108, ecapa_loss=0.0001672, whisper_loss=0.09094, over 3848816.27 frames. ], batch size: 93, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 10:59:20,913 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 19 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 10:59:24,091 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 10:59:35,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2131030.0, ans=0.125 2024-08-13 10:59:43,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2131130.0, ans=0.0 2024-08-13 10:59:53,525 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 24 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-13 10:59:58,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2131230.0, ans=0.125 2024-08-13 11:00:02,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2131230.0, ans=0.04949747468305833 2024-08-13 11:00:07,076 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-13 11:00:14,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2131330.0, ans=0.125 2024-08-13 11:00:22,284 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.966e+01 2.425e+01 2.688e+01 3.008e+01 5.255e+01, threshold=5.377e+01, percent-clipped=0.0 2024-08-13 11:00:27,129 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10250, loss[loss=0.09786, beats_loss=0.01157, ecapa_loss=0.0001162, whisper_loss=0.08513, over 17637.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01085, ecapa_loss=0.0001662, whisper_loss=0.0905, over 3884444.21 frames. ], batch size: 66, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:00:35,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-08-13 11:00:54,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2131530.0, ans=0.0 2024-08-13 11:01:11,449 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 11:01:11,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2131630.0, ans=0.125 2024-08-13 11:01:34,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2131830.0, ans=0.125 2024-08-13 11:01:49,434 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10300, loss[loss=0.09109, beats_loss=0.01342, ecapa_loss=0.000139, whisper_loss=0.07628, over 23274.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01094, ecapa_loss=0.000166, whisper_loss=0.08977, over 3897739.59 frames. ], batch size: 90, lr: 4.16e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:01:58,997 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-13 11:02:00,061 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-13 11:02:07,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2132030.0, ans=0.125 2024-08-13 11:02:16,585 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-13 11:02:31,105 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 11:02:45,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2132230.0, ans=0.0 2024-08-13 11:02:51,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2132330.0, ans=0.125 2024-08-13 11:03:03,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.417e+01 2.741e+01 3.040e+01 4.375e+02, threshold=5.481e+01, percent-clipped=2.0 2024-08-13 11:03:07,871 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10350, loss[loss=0.08347, beats_loss=0.01064, ecapa_loss=0.0001579, whisper_loss=0.07125, over 18699.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01093, ecapa_loss=0.0001646, whisper_loss=0.09041, over 3907310.51 frames. ], batch size: 77, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:03:21,902 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 11:03:33,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2132530.0, ans=0.125 2024-08-13 11:03:51,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2132630.0, ans=0.1 2024-08-13 11:04:05,944 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-13 11:04:20,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.44 vs. limit=22.5 2024-08-13 11:04:23,667 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-13 11:04:24,898 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10400, loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001786, whisper_loss=0.08993, over 20658.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01089, ecapa_loss=0.0001643, whisper_loss=0.09059, over 3911235.63 frames. ], batch size: 86, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:04:46,510 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 11:05:02,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2133130.0, ans=0.125 2024-08-13 11:05:22,885 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 11:05:37,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.016e+01 2.409e+01 2.723e+01 2.969e+01 5.956e+01, threshold=5.446e+01, percent-clipped=1.0 2024-08-13 11:05:42,390 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10450, loss[loss=0.09283, beats_loss=0.009989, ecapa_loss=0.0002058, whisper_loss=0.08079, over 14200.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01091, ecapa_loss=0.0001646, whisper_loss=0.09048, over 3887913.38 frames. ], batch size: 59, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:05:44,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2133430.0, ans=0.125 2024-08-13 11:05:44,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2133430.0, ans=0.1 2024-08-13 11:05:56,837 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 22 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-13 11:06:08,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2133530.0, ans=0.0 2024-08-13 11:06:09,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2133530.0, ans=0.125 2024-08-13 11:06:23,341 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 11:06:29,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2133730.0, ans=0.1 2024-08-13 11:06:32,410 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 30 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-13 11:06:50,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2133830.0, ans=0.125 2024-08-13 11:06:54,705 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 11:06:58,833 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10500, loss[loss=0.1029, beats_loss=0.009921, ecapa_loss=0.0001726, whisper_loss=0.09123, over 23001.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001647, whisper_loss=0.09117, over 3885827.48 frames. ], batch size: 95, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:07:06,312 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-13 11:07:23,518 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 14 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-13 11:07:26,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2134030.0, ans=0.125 2024-08-13 11:07:27,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-08-13 11:07:37,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2134130.0, ans=0.2 2024-08-13 11:07:38,919 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 11:07:47,004 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 17 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-13 11:07:53,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2134230.0, ans=0.125 2024-08-13 11:08:10,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2134330.0, ans=0.125 2024-08-13 11:08:10,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-08-13 11:08:12,578 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.440e+01 2.652e+01 2.992e+01 8.819e+01, threshold=5.304e+01, percent-clipped=1.0 2024-08-13 11:08:15,668 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-13 11:08:17,226 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10550, loss[loss=0.09966, beats_loss=0.0115, ecapa_loss=0.0001524, whisper_loss=0.08664, over 21135.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01088, ecapa_loss=0.0001648, whisper_loss=0.09074, over 3863549.96 frames. ], batch size: 84, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:08:17,430 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 11:08:27,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2134430.0, ans=0.0 2024-08-13 11:08:28,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2134430.0, ans=0.0 2024-08-13 11:08:33,777 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 10 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 11:08:33,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2134530.0, ans=0.125 2024-08-13 11:08:35,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2134530.0, ans=0.1 2024-08-13 11:08:38,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2134530.0, ans=0.125 2024-08-13 11:08:40,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2134530.0, ans=0.2 2024-08-13 11:08:47,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2134530.0, ans=0.1 2024-08-13 11:08:55,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.40 vs. limit=15.0 2024-08-13 11:08:57,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2134630.0, ans=0.125 2024-08-13 11:09:02,714 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 11:09:19,008 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-13 11:09:19,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.03 vs. limit=22.5 2024-08-13 11:09:22,320 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 11:09:30,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2134830.0, ans=0.2 2024-08-13 11:09:38,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10600, loss[loss=0.09909, beats_loss=0.01004, ecapa_loss=0.0001455, whisper_loss=0.08759, over 21714.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001644, whisper_loss=0.09068, over 3890349.42 frames. ], batch size: 82, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:09:40,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2134930.0, ans=0.125 2024-08-13 11:09:42,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2134930.0, ans=0.5 2024-08-13 11:09:51,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2024-08-13 11:10:32,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2135230.0, ans=0.1 2024-08-13 11:10:33,915 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 11:10:36,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2135230.0, ans=0.2 2024-08-13 11:10:46,635 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-13 11:10:46,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2135330.0, ans=0.125 2024-08-13 11:10:52,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.455e+01 2.918e+01 3.137e+01 4.464e+01, threshold=5.836e+01, percent-clipped=0.0 2024-08-13 11:10:57,066 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10650, loss[loss=0.09952, beats_loss=0.01118, ecapa_loss=0.0001739, whisper_loss=0.0866, over 19485.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01085, ecapa_loss=0.0001634, whisper_loss=0.09072, over 3843751.13 frames. ], batch size: 78, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:11:07,976 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 11:11:29,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2135630.0, ans=0.0 2024-08-13 11:11:58,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.74 vs. limit=22.5 2024-08-13 11:12:12,608 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 11:12:13,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2135830.0, ans=0.1 2024-08-13 11:12:15,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10700, loss[loss=0.09369, beats_loss=0.01225, ecapa_loss=0.0001793, whisper_loss=0.07965, over 14981.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001628, whisper_loss=0.09141, over 3851764.04 frames. ], batch size: 61, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:12:36,554 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 11:12:44,720 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-13 11:12:45,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2136130.0, ans=0.1 2024-08-13 11:12:52,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2136130.0, ans=0.0 2024-08-13 11:13:24,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2136330.0, ans=0.125 2024-08-13 11:13:26,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.457e+01 2.823e+01 3.286e+01 3.691e+02, threshold=5.645e+01, percent-clipped=1.0 2024-08-13 11:13:28,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2136330.0, ans=0.125 2024-08-13 11:13:31,240 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10750, loss[loss=0.1141, beats_loss=0.0119, ecapa_loss=0.0001674, whisper_loss=0.1005, over 22942.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01084, ecapa_loss=0.0001637, whisper_loss=0.0916, over 3857004.97 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:13:40,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2136430.0, ans=0.1 2024-08-13 11:13:56,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2136530.0, ans=0.0 2024-08-13 11:14:08,952 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 37 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-13 11:14:13,425 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-13 11:14:47,093 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10800, loss[loss=0.09938, beats_loss=0.01341, ecapa_loss=0.0001467, whisper_loss=0.0845, over 22340.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01075, ecapa_loss=0.0001645, whisper_loss=0.09256, over 3878289.61 frames. ], batch size: 93, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:14:53,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-13 11:15:06,168 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-13 11:15:18,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2137130.0, ans=0.05 2024-08-13 11:15:26,496 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-13 11:15:28,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2137130.0, ans=0.0 2024-08-13 11:15:37,805 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-13 11:15:43,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2137230.0, ans=0.125 2024-08-13 11:15:45,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2137330.0, ans=0.125 2024-08-13 11:15:49,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.89 vs. limit=22.5 2024-08-13 11:15:56,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.544e+01 2.753e+01 3.369e+01 1.648e+02, threshold=5.506e+01, percent-clipped=4.0 2024-08-13 11:16:00,882 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10850, loss[loss=0.1176, beats_loss=0.0109, ecapa_loss=0.0001718, whisper_loss=0.105, over 17886.00 frames. ], tot_loss[loss=0.1055, beats_loss=0.01069, ecapa_loss=0.0001655, whisper_loss=0.09319, over 3890037.91 frames. ], batch size: 69, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:16:06,880 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 24 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-13 11:16:13,635 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-13 11:16:22,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2137530.0, ans=10.0 2024-08-13 11:16:22,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2137530.0, ans=0.07 2024-08-13 11:16:23,890 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 28 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-13 11:16:33,902 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 11:17:03,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-13 11:17:07,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2024-08-13 11:17:11,326 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 11:17:16,516 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10900, loss[loss=0.1029, beats_loss=0.009579, ecapa_loss=0.0001893, whisper_loss=0.09143, over 22845.00 frames. ], tot_loss[loss=0.1056, beats_loss=0.0107, ecapa_loss=0.0001648, whisper_loss=0.0933, over 3914662.83 frames. ], batch size: 93, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:17:24,743 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-13 11:17:26,216 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 21 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-13 11:17:32,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2138030.0, ans=0.125 2024-08-13 11:17:34,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2024-08-13 11:17:41,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2138030.0, ans=0.125 2024-08-13 11:18:02,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2138230.0, ans=0.0 2024-08-13 11:18:26,194 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.488e+01 2.800e+01 3.283e+01 5.415e+01, threshold=5.600e+01, percent-clipped=0.0 2024-08-13 11:18:30,675 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 10950, loss[loss=0.1042, beats_loss=0.01226, ecapa_loss=0.0001256, whisper_loss=0.09065, over 18756.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01078, ecapa_loss=0.0001639, whisper_loss=0.09279, over 3902779.25 frames. ], batch size: 72, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:18:38,953 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-13 11:18:47,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2138530.0, ans=0.0 2024-08-13 11:18:47,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2138530.0, ans=0.125 2024-08-13 11:18:55,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2138530.0, ans=0.2 2024-08-13 11:19:13,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2138630.0, ans=0.125 2024-08-13 11:19:16,012 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 11:19:23,637 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-13 11:19:25,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2138730.0, ans=0.125 2024-08-13 11:19:31,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2138830.0, ans=0.2 2024-08-13 11:19:32,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=8.0 2024-08-13 11:19:46,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2138830.0, ans=0.04949747468305833 2024-08-13 11:19:48,397 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11000, loss[loss=0.08912, beats_loss=0.01117, ecapa_loss=0.0001331, whisper_loss=0.07662, over 20925.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001653, whisper_loss=0.09211, over 3876971.84 frames. ], batch size: 83, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:19:52,647 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 11:20:00,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2024-08-13 11:20:01,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2139030.0, ans=0.125 2024-08-13 11:20:07,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2139030.0, ans=0.125 2024-08-13 11:20:13,740 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 26 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-13 11:20:25,405 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 11:20:34,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-13 11:20:37,393 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 22 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-13 11:20:40,076 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 24 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-13 11:20:47,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.71 vs. limit=22.5 2024-08-13 11:20:48,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2139330.0, ans=0.1 2024-08-13 11:20:54,424 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-13 11:20:55,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2139330.0, ans=0.125 2024-08-13 11:20:58,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.481e+01 2.729e+01 3.286e+01 1.330e+02, threshold=5.458e+01, percent-clipped=4.0 2024-08-13 11:21:03,172 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11050, loss[loss=0.08546, beats_loss=0.01216, ecapa_loss=0.0001859, whisper_loss=0.07144, over 18401.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01081, ecapa_loss=0.0001653, whisper_loss=0.09228, over 3898310.36 frames. ], batch size: 78, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:21:06,312 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-13 11:21:10,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=12.0 2024-08-13 11:21:13,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=12.0 2024-08-13 11:21:18,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=12.0 2024-08-13 11:21:23,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2139530.0, ans=0.125 2024-08-13 11:21:25,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2139530.0, ans=0.0 2024-08-13 11:21:32,624 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-13 11:21:47,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2139630.0, ans=0.125 2024-08-13 11:22:11,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2139730.0, ans=0.1 2024-08-13 11:22:13,503 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-13 11:22:38,447 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11100, loss[loss=0.09905, beats_loss=0.01218, ecapa_loss=0.0001359, whisper_loss=0.08551, over 23480.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001647, whisper_loss=0.09226, over 3917286.78 frames. ], batch size: 94, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:22:40,321 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 11:22:54,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2139930.0, ans=0.0 2024-08-13 11:23:04,230 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 11:23:04,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2140030.0, ans=0.2 2024-08-13 11:23:14,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2140030.0, ans=0.125 2024-08-13 11:23:22,347 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:23:38,067 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-13 11:24:00,464 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-13 11:24:07,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.93 vs. limit=10.0 2024-08-13 11:24:11,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.487e+01 2.717e+01 3.069e+01 5.884e+01, threshold=5.434e+01, percent-clipped=1.0 2024-08-13 11:24:13,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2140330.0, ans=0.0 2024-08-13 11:24:16,394 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11150, loss[loss=0.1202, beats_loss=0.009414, ecapa_loss=0.0001598, whisper_loss=0.1092, over 23937.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01078, ecapa_loss=0.0001636, whisper_loss=0.09209, over 3905836.19 frames. ], batch size: 89, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:24:24,237 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-13 11:25:05,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2140730.0, ans=0.125 2024-08-13 11:25:23,846 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 11:25:27,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2140830.0, ans=0.0 2024-08-13 11:25:30,056 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11200, loss[loss=0.1119, beats_loss=0.008991, ecapa_loss=0.0001939, whisper_loss=0.1009, over 21351.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01071, ecapa_loss=0.0001643, whisper_loss=0.09232, over 3899901.49 frames. ], batch size: 89, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:25:37,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2140930.0, ans=0.2 2024-08-13 11:25:47,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2141030.0, ans=0.1 2024-08-13 11:25:50,480 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 26 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-13 11:26:27,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2141330.0, ans=0.125 2024-08-13 11:26:39,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.420e+01 2.628e+01 2.915e+01 3.904e+01, threshold=5.256e+01, percent-clipped=0.0 2024-08-13 11:26:43,791 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11250, loss[loss=0.1159, beats_loss=0.009815, ecapa_loss=0.000171, whisper_loss=0.1044, over 22604.00 frames. ], tot_loss[loss=0.1053, beats_loss=0.01073, ecapa_loss=0.0001644, whisper_loss=0.09294, over 3919751.97 frames. ], batch size: 90, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:26:55,092 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 11:27:15,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2141630.0, ans=0.125 2024-08-13 11:27:29,743 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 11:27:31,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-13 11:27:36,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2141730.0, ans=0.125 2024-08-13 11:27:37,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2141730.0, ans=0.0 2024-08-13 11:27:42,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2024-08-13 11:27:55,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2024-08-13 11:27:57,772 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11300, loss[loss=0.1052, beats_loss=0.01133, ecapa_loss=0.0001298, whisper_loss=0.09257, over 17076.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01074, ecapa_loss=0.0001646, whisper_loss=0.09245, over 3908661.77 frames. ], batch size: 64, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:27:59,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2024-08-13 11:28:00,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2141930.0, ans=0.0 2024-08-13 11:28:03,720 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-13 11:28:26,489 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 11:28:27,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2142130.0, ans=0.1 2024-08-13 11:28:33,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2142130.0, ans=0.125 2024-08-13 11:28:44,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2142230.0, ans=0.1 2024-08-13 11:28:49,554 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-13 11:29:01,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2142330.0, ans=0.2 2024-08-13 11:29:05,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2142330.0, ans=0.125 2024-08-13 11:29:06,996 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.518e+01 2.742e+01 3.086e+01 4.928e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-13 11:29:11,375 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11350, loss[loss=0.1109, beats_loss=0.01086, ecapa_loss=0.0001367, whisper_loss=0.09864, over 18406.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.01068, ecapa_loss=0.0001653, whisper_loss=0.09289, over 3922931.08 frames. ], batch size: 71, lr: 4.15e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:29:11,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2142430.0, ans=0.1 2024-08-13 11:29:26,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-08-13 11:29:30,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=22.5 2024-08-13 11:29:42,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2142630.0, ans=0.0 2024-08-13 11:29:46,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2142630.0, ans=0.1 2024-08-13 11:29:50,249 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.338e-02 2024-08-13 11:30:05,410 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 11:30:10,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2142830.0, ans=0.1 2024-08-13 11:30:10,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2142830.0, ans=0.0 2024-08-13 11:30:15,512 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-13 11:30:25,345 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11400, loss[loss=0.1173, beats_loss=0.009756, ecapa_loss=0.0001698, whisper_loss=0.1058, over 16355.00 frames. ], tot_loss[loss=0.105, beats_loss=0.0107, ecapa_loss=0.0001656, whisper_loss=0.09267, over 3915144.80 frames. ], batch size: 63, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:30:26,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2024-08-13 11:30:31,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2142930.0, ans=0.125 2024-08-13 11:30:38,642 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 11:30:47,971 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-13 11:30:52,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-13 11:31:00,598 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-13 11:31:09,298 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 25 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 11:31:15,424 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-13 11:31:16,793 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-13 11:31:35,368 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 29 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-13 11:31:36,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.547e+01 2.847e+01 3.262e+01 4.632e+01, threshold=5.695e+01, percent-clipped=0.0 2024-08-13 11:31:42,366 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11450, loss[loss=0.11, beats_loss=0.0119, ecapa_loss=0.0001552, whisper_loss=0.09654, over 23007.00 frames. ], tot_loss[loss=0.1047, beats_loss=0.01076, ecapa_loss=0.0001644, whisper_loss=0.09234, over 3907386.63 frames. ], batch size: 95, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:31:45,968 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-13 11:31:53,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=22.5 2024-08-13 11:31:53,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=12.0 2024-08-13 11:31:56,725 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 11:32:07,916 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 11:32:38,461 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-13 11:32:40,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-13 11:32:41,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2143730.0, ans=0.125 2024-08-13 11:32:54,158 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 26 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 11:33:00,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11500, loss[loss=0.1137, beats_loss=0.01129, ecapa_loss=0.0001446, whisper_loss=0.1009, over 24147.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01075, ecapa_loss=0.0001639, whisper_loss=0.09274, over 3885877.48 frames. ], batch size: 94, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:33:14,411 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-13 11:33:23,958 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:33:34,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2144130.0, ans=0.125 2024-08-13 11:33:39,053 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.675e-03 2024-08-13 11:33:52,979 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 23 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-13 11:34:07,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2144330.0, ans=15.0 2024-08-13 11:34:10,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.467e+01 2.720e+01 3.175e+01 4.456e+01, threshold=5.439e+01, percent-clipped=0.0 2024-08-13 11:34:10,347 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 24 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-13 11:34:14,715 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11550, loss[loss=0.1025, beats_loss=0.008052, ecapa_loss=0.0002065, whisper_loss=0.09239, over 17975.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01074, ecapa_loss=0.0001648, whisper_loss=0.09248, over 3884069.59 frames. ], batch size: 72, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:34:29,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2144530.0, ans=0.125 2024-08-13 11:34:36,727 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-13 11:34:36,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2144530.0, ans=0.125 2024-08-13 11:34:43,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2144630.0, ans=0.0 2024-08-13 11:34:53,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2144630.0, ans=0.2 2024-08-13 11:34:59,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2144730.0, ans=0.125 2024-08-13 11:35:23,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2144830.0, ans=0.0 2024-08-13 11:35:28,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2144930.0, ans=0.125 2024-08-13 11:35:29,248 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11600, loss[loss=0.08379, beats_loss=0.01276, ecapa_loss=0.0001139, whisper_loss=0.06989, over 14847.00 frames. ], tot_loss[loss=0.1054, beats_loss=0.01076, ecapa_loss=0.0001629, whisper_loss=0.09299, over 3924354.25 frames. ], batch size: 57, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:35:29,440 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-13 11:35:48,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2145030.0, ans=0.125 2024-08-13 11:36:06,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2145130.0, ans=10.0 2024-08-13 11:36:09,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2145130.0, ans=0.0 2024-08-13 11:36:11,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=12.0 2024-08-13 11:36:15,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.90 vs. limit=22.5 2024-08-13 11:36:16,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2145230.0, ans=10.0 2024-08-13 11:36:30,007 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 11:36:37,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.430e+01 2.771e+01 3.076e+01 5.105e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 11:36:41,860 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11650, loss[loss=0.08653, beats_loss=0.0104, ecapa_loss=0.000152, whisper_loss=0.07461, over 14386.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.01078, ecapa_loss=0.0001629, whisper_loss=0.09267, over 3906052.18 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 5.764607523034235e+17 2024-08-13 11:37:00,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2145530.0, ans=0.0 2024-08-13 11:37:12,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2145630.0, ans=0.0 2024-08-13 11:37:21,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2145630.0, ans=0.125 2024-08-13 11:37:30,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2145730.0, ans=0.2 2024-08-13 11:37:44,487 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-13 11:37:57,287 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11700, loss[loss=0.109, beats_loss=0.01093, ecapa_loss=0.0001658, whisper_loss=0.09644, over 23823.00 frames. ], tot_loss[loss=0.1052, beats_loss=0.0108, ecapa_loss=0.000164, whisper_loss=0.09275, over 3956174.36 frames. ], batch size: 94, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:38:02,389 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-13 11:38:33,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2146130.0, ans=0.125 2024-08-13 11:38:34,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2024-08-13 11:38:46,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2146230.0, ans=0.125 2024-08-13 11:38:52,320 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 11:38:58,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2146330.0, ans=0.2 2024-08-13 11:39:06,541 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-13 11:39:07,588 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+01 2.516e+01 2.793e+01 3.243e+01 6.496e+01, threshold=5.587e+01, percent-clipped=2.0 2024-08-13 11:39:11,879 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11750, loss[loss=0.1237, beats_loss=0.008088, ecapa_loss=0.0001714, whisper_loss=0.1139, over 15287.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01096, ecapa_loss=0.0001622, whisper_loss=0.09166, over 3921345.37 frames. ], batch size: 59, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:39:27,637 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 17 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-13 11:39:32,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-13 11:39:34,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=12.0 2024-08-13 11:39:35,361 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 11:39:44,295 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 33 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-13 11:39:47,230 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-13 11:39:47,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=2146630.0, ans=15.0 2024-08-13 11:39:53,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2024-08-13 11:39:55,567 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-13 11:40:05,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2146730.0, ans=0.0 2024-08-13 11:40:18,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2146830.0, ans=0.125 2024-08-13 11:40:23,400 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11800, loss[loss=0.08433, beats_loss=0.01318, ecapa_loss=0.0001346, whisper_loss=0.06981, over 22025.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01102, ecapa_loss=0.0001621, whisper_loss=0.09057, over 3927016.04 frames. ], batch size: 87, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:40:43,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.96 vs. limit=10.0 2024-08-13 11:40:52,867 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-13 11:41:02,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2147130.0, ans=10.0 2024-08-13 11:41:04,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2147230.0, ans=0.125 2024-08-13 11:41:29,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.422e+01 2.679e+01 2.998e+01 8.058e+01, threshold=5.358e+01, percent-clipped=1.0 2024-08-13 11:41:33,445 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11850, loss[loss=0.09793, beats_loss=0.0122, ecapa_loss=0.0001768, whisper_loss=0.08396, over 19612.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01105, ecapa_loss=0.0001616, whisper_loss=0.09118, over 3927920.43 frames. ], batch size: 83, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:41:36,079 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 22 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-13 11:41:36,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2147430.0, ans=0.125 2024-08-13 11:41:39,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2147430.0, ans=0.07 2024-08-13 11:41:40,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2147430.0, ans=0.0 2024-08-13 11:41:41,577 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-13 11:42:01,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-13 11:42:30,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2147830.0, ans=0.125 2024-08-13 11:42:33,705 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 11:42:34,961 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-13 11:42:42,677 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11900, loss[loss=0.1247, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.1129, over 16434.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.011, ecapa_loss=0.0001626, whisper_loss=0.09191, over 3934242.19 frames. ], batch size: 60, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:42:43,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-13 11:42:44,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2147930.0, ans=0.125 2024-08-13 11:42:58,218 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 11:42:59,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2148030.0, ans=0.0 2024-08-13 11:43:05,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2148030.0, ans=0.125 2024-08-13 11:43:22,901 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 11:43:23,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2148230.0, ans=0.1 2024-08-13 11:43:47,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.622e+01 2.921e+01 5.658e+01, threshold=5.245e+01, percent-clipped=1.0 2024-08-13 11:43:51,922 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 11950, loss[loss=0.1238, beats_loss=0.009076, ecapa_loss=0.0001601, whisper_loss=0.1131, over 19614.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01092, ecapa_loss=0.000164, whisper_loss=0.09172, over 3936031.47 frames. ], batch size: 73, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:44:05,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2148530.0, ans=0.1 2024-08-13 11:44:22,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2024-08-13 11:44:31,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2148730.0, ans=0.125 2024-08-13 11:44:34,122 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-13 11:44:49,334 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 11:44:57,277 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12000, loss[loss=0.1047, beats_loss=0.00883, ecapa_loss=0.0001449, whisper_loss=0.09447, over 19101.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01091, ecapa_loss=0.0001633, whisper_loss=0.09176, over 3899961.86 frames. ], batch size: 71, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:44:57,278 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 11:45:36,606 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005616, whisper_loss=0.2486, over 922467.00 frames. 2024-08-13 11:45:55,798 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on SV_voxceleb1: loss=0.004517, beats_loss=0, ecapa_loss=0.0004517, whisper_loss=0, over 939242.00 frames. 2024-08-13 11:46:35,247 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.2605, 1.8330, 1.6617, 1.6462], device='cuda:0') 2024-08-13 11:47:56,494 INFO [train_multi_KD3.py:1149] (0/4) Epoch 15, validation on AT_audioset: loss=0.0239, beats_loss=0.0239, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 11:47:56,498 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 11:48:07,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2148930.0, ans=0.1 2024-08-13 11:48:15,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2149030.0, ans=0.0 2024-08-13 11:48:19,567 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-13 11:48:39,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2149230.0, ans=0.0 2024-08-13 11:48:59,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.423e+01 2.671e+01 3.267e+01 7.662e+01, threshold=5.342e+01, percent-clipped=3.0 2024-08-13 11:49:03,393 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12050, loss[loss=0.1038, beats_loss=0.01152, ecapa_loss=0.0001328, whisper_loss=0.09092, over 17013.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0109, ecapa_loss=0.0001631, whisper_loss=0.09124, over 3878900.03 frames. ], batch size: 65, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:49:15,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2149530.0, ans=0.1 2024-08-13 11:49:20,630 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-13 11:49:37,050 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-13 11:49:38,377 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 11:49:47,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2149730.0, ans=0.125 2024-08-13 11:49:49,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2149730.0, ans=0.1 2024-08-13 11:50:07,986 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12100, loss[loss=0.09532, beats_loss=0.01143, ecapa_loss=0.0002233, whisper_loss=0.08166, over 18509.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01084, ecapa_loss=0.0001644, whisper_loss=0.0914, over 3879212.60 frames. ], batch size: 79, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:50:11,695 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 11:50:16,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-08-13 11:50:21,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2150030.0, ans=0.125 2024-08-13 11:50:25,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2150030.0, ans=0.0 2024-08-13 11:50:37,716 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-13 11:50:49,579 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-13 11:50:52,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2150230.0, ans=0.2 2024-08-13 11:50:56,913 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 11:51:02,352 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-13 11:51:02,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=12.0 2024-08-13 11:51:05,202 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-13 11:51:08,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.393e+01 2.671e+01 2.986e+01 4.532e+01, threshold=5.343e+01, percent-clipped=0.0 2024-08-13 11:51:12,759 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12150, loss[loss=0.09321, beats_loss=0.01034, ecapa_loss=0.0002105, whisper_loss=0.08076, over 18841.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01081, ecapa_loss=0.0001653, whisper_loss=0.09168, over 3907761.83 frames. ], batch size: 81, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:51:15,552 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 22 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-13 11:51:15,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2150430.0, ans=0.125 2024-08-13 11:51:30,628 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 11:51:32,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2150530.0, ans=0.125 2024-08-13 11:51:37,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2150530.0, ans=0.1 2024-08-13 11:51:46,623 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 11:51:51,721 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 23 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-13 11:51:54,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2150730.0, ans=0.125 2024-08-13 11:52:19,380 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12200, loss[loss=0.08607, beats_loss=0.01299, ecapa_loss=0.0001573, whisper_loss=0.0715, over 15669.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001657, whisper_loss=0.09095, over 3866380.30 frames. ], batch size: 64, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:52:36,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2151030.0, ans=0.0 2024-08-13 11:52:38,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-13 11:52:44,343 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 18 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 11:52:44,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2151130.0, ans=0.125 2024-08-13 11:53:02,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2024-08-13 11:53:03,952 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 24 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-13 11:53:09,468 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 11:53:18,597 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 11:53:21,090 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.490e+01 2.780e+01 3.147e+01 4.927e+01, threshold=5.560e+01, percent-clipped=0.0 2024-08-13 11:53:25,104 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12250, loss[loss=0.1165, beats_loss=0.009506, ecapa_loss=0.0001693, whisper_loss=0.1053, over 17896.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01083, ecapa_loss=0.0001653, whisper_loss=0.09091, over 3877327.30 frames. ], batch size: 69, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:53:33,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.97 vs. limit=22.5 2024-08-13 11:53:35,746 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 11:53:47,550 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 11:53:48,756 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 16 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 11:53:52,823 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.076e+01 2024-08-13 11:53:56,473 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 19 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-13 11:53:56,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2151630.0, ans=0.125 2024-08-13 11:53:58,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2151630.0, ans=0.2 2024-08-13 11:54:00,547 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 19 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-13 11:54:04,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2151730.0, ans=0.125 2024-08-13 11:54:10,404 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.633e+05 2024-08-13 11:54:12,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2151730.0, ans=0.07 2024-08-13 11:54:12,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2151730.0, ans=10.0 2024-08-13 11:54:13,666 INFO [train_multi_KD3.py:844] (0/4) A total of 95 cuts. 36 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 11:54:17,546 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 11:54:30,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12300, loss[loss=0.1066, beats_loss=0.01082, ecapa_loss=0.0001221, whisper_loss=0.09459, over 17152.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01075, ecapa_loss=0.0001654, whisper_loss=0.09041, over 3855051.66 frames. ], batch size: 63, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:54:41,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-13 11:54:44,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2152030.0, ans=0.125 2024-08-13 11:55:01,243 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-13 11:55:01,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2152130.0, ans=0.125 2024-08-13 11:55:06,546 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 25 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-13 11:55:06,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2152130.0, ans=0.2 2024-08-13 11:55:17,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2152230.0, ans=0.125 2024-08-13 11:55:21,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2152330.0, ans=0.0 2024-08-13 11:55:23,074 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-13 11:55:28,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-13 11:55:32,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.395e+01 2.675e+01 2.989e+01 4.697e+01, threshold=5.351e+01, percent-clipped=0.0 2024-08-13 11:55:34,135 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 21 from LS+wenet, 31 from Vox, 42 fro AS 2024-08-13 11:55:36,414 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12350, loss[loss=0.07253, beats_loss=0.01006, ecapa_loss=0.0002035, whisper_loss=0.06044, over 12147.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01076, ecapa_loss=0.0001662, whisper_loss=0.09075, over 3863229.51 frames. ], batch size: 53, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:55:40,370 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 29 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 11:56:13,875 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 11:56:14,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2024-08-13 11:56:19,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=12.0 2024-08-13 11:56:32,013 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 11:56:34,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2152830.0, ans=0.125 2024-08-13 11:56:35,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2024-08-13 11:56:40,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2152930.0, ans=0.2 2024-08-13 11:56:40,993 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12400, loss[loss=0.1463, beats_loss=0.007247, ecapa_loss=0.0001627, whisper_loss=0.1374, over 23870.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01065, ecapa_loss=0.0001652, whisper_loss=0.09187, over 3884649.07 frames. ], batch size: 85, lr: 4.14e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:56:51,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2152930.0, ans=0.1 2024-08-13 11:56:55,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-13 11:56:58,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.91 vs. limit=15.0 2024-08-13 11:57:00,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2153030.0, ans=0.0 2024-08-13 11:57:01,685 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 11:57:28,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2153230.0, ans=0.0 2024-08-13 11:57:35,051 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 11:57:43,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.413e+01 2.568e+01 2.884e+01 5.690e+01, threshold=5.135e+01, percent-clipped=1.0 2024-08-13 11:57:47,377 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12450, loss[loss=0.1119, beats_loss=0.007044, ecapa_loss=0.0001664, whisper_loss=0.1032, over 20512.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01062, ecapa_loss=0.0001653, whisper_loss=0.09224, over 3865388.06 frames. ], batch size: 78, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:57:47,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2153430.0, ans=0.07 2024-08-13 11:57:50,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2153430.0, ans=0.1 2024-08-13 11:57:57,666 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-13 11:57:59,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2153530.0, ans=0.07 2024-08-13 11:58:07,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2153530.0, ans=0.07 2024-08-13 11:58:08,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2153530.0, ans=0.125 2024-08-13 11:58:13,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2153630.0, ans=0.0 2024-08-13 11:58:28,096 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-13 11:58:38,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2024-08-13 11:58:42,624 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.447e-01 2024-08-13 11:58:53,009 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12500, loss[loss=0.1152, beats_loss=0.009444, ecapa_loss=0.0001551, whisper_loss=0.1042, over 23439.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01066, ecapa_loss=0.000165, whisper_loss=0.0919, over 3895359.36 frames. ], batch size: 91, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 11:59:10,613 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.740e+01 2024-08-13 11:59:17,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2154030.0, ans=0.1 2024-08-13 11:59:31,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2154230.0, ans=0.0 2024-08-13 11:59:32,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2154230.0, ans=0.1 2024-08-13 11:59:33,775 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 22 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-13 11:59:35,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2154230.0, ans=0.1 2024-08-13 11:59:38,785 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 11:59:45,645 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 11:59:54,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.421e+01 2.658e+01 2.977e+01 4.803e+01, threshold=5.316e+01, percent-clipped=0.0 2024-08-13 11:59:58,480 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12550, loss[loss=0.08005, beats_loss=0.01189, ecapa_loss=0.0001542, whisper_loss=0.06661, over 21065.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01064, ecapa_loss=0.000164, whisper_loss=0.09218, over 3867937.77 frames. ], batch size: 85, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:00:13,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2154530.0, ans=0.125 2024-08-13 12:00:16,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2154530.0, ans=0.0 2024-08-13 12:00:23,831 INFO [train_multi_KD3.py:844] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 12:00:30,337 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 17 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-13 12:00:38,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2024-08-13 12:00:39,422 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 32 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-13 12:01:02,178 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-13 12:01:04,864 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12600, loss[loss=0.1058, beats_loss=0.01073, ecapa_loss=0.0001624, whisper_loss=0.09344, over 18041.00 frames. ], tot_loss[loss=0.1048, beats_loss=0.01071, ecapa_loss=0.0001629, whisper_loss=0.09241, over 3890725.86 frames. ], batch size: 72, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:01:14,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-13 12:01:36,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2155130.0, ans=0.125 2024-08-13 12:01:37,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2155130.0, ans=0.0 2024-08-13 12:01:52,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2155230.0, ans=0.0 2024-08-13 12:01:56,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2155330.0, ans=0.0 2024-08-13 12:02:06,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.360e+01 2.643e+01 2.873e+01 1.126e+02, threshold=5.286e+01, percent-clipped=2.0 2024-08-13 12:02:10,466 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12650, loss[loss=0.08349, beats_loss=0.0102, ecapa_loss=0.0002122, whisper_loss=0.07117, over 20023.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01072, ecapa_loss=0.0001633, whisper_loss=0.09211, over 3867176.85 frames. ], batch size: 87, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:02:57,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2155730.0, ans=0.0 2024-08-13 12:03:04,211 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-13 12:03:05,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2155830.0, ans=0.125 2024-08-13 12:03:14,924 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12700, loss[loss=0.1358, beats_loss=0.008728, ecapa_loss=0.0001441, whisper_loss=0.1256, over 23764.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0001621, whisper_loss=0.0919, over 3869875.33 frames. ], batch size: 89, lr: 4.13e-03, grad_scale: 1.152921504606847e+18 2024-08-13 12:03:20,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2155930.0, ans=10.0 2024-08-13 12:03:23,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2155930.0, ans=0.0 2024-08-13 12:03:25,608 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 14 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-13 12:03:37,216 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 36 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-13 12:03:38,672 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 16 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-13 12:03:38,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2156030.0, ans=0.125 2024-08-13 12:03:44,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2156130.0, ans=0.125 2024-08-13 12:03:45,346 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-13 12:03:49,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2156130.0, ans=0.0 2024-08-13 12:03:54,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2156230.0, ans=0.125 2024-08-13 12:03:55,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2156230.0, ans=0.125 2024-08-13 12:04:11,048 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 12:04:12,492 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 24 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-13 12:04:14,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.16 vs. limit=22.5 2024-08-13 12:04:17,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.422e+01 2.694e+01 3.051e+01 5.714e+01, threshold=5.388e+01, percent-clipped=1.0 2024-08-13 12:04:18,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2156430.0, ans=0.125 2024-08-13 12:04:19,921 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12750, loss[loss=0.08496, beats_loss=0.01319, ecapa_loss=0.0001799, whisper_loss=0.06997, over 22387.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01088, ecapa_loss=0.0001634, whisper_loss=0.09175, over 3902766.62 frames. ], batch size: 95, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:04:28,031 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-13 12:04:42,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2156530.0, ans=0.5 2024-08-13 12:04:43,765 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 26 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 12:04:45,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2156630.0, ans=0.035 2024-08-13 12:05:20,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2156830.0, ans=0.125 2024-08-13 12:05:27,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12800, loss[loss=0.1109, beats_loss=0.0105, ecapa_loss=0.0002368, whisper_loss=0.098, over 20926.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01087, ecapa_loss=0.0001654, whisper_loss=0.09237, over 3914287.37 frames. ], batch size: 89, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:05:31,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2156930.0, ans=0.0 2024-08-13 12:05:45,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-13 12:05:53,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2157130.0, ans=10.0 2024-08-13 12:05:54,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-13 12:06:06,085 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-13 12:06:14,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2157230.0, ans=0.125 2024-08-13 12:06:17,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2157230.0, ans=0.0 2024-08-13 12:06:25,470 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-13 12:06:26,966 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-13 12:06:34,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.329e+01 2.579e+01 3.123e+01 7.384e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-13 12:06:37,259 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12850, loss[loss=0.109, beats_loss=0.007811, ecapa_loss=0.0002258, whisper_loss=0.09893, over 13130.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01088, ecapa_loss=0.0001647, whisper_loss=0.09171, over 3887628.90 frames. ], batch size: 54, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:06:38,643 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 22 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 12:06:57,891 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-13 12:07:04,577 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-13 12:07:24,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2157730.0, ans=0.0 2024-08-13 12:07:34,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2157830.0, ans=0.0 2024-08-13 12:07:35,918 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 12:07:49,835 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12900, loss[loss=0.09563, beats_loss=0.009057, ecapa_loss=0.0001802, whisper_loss=0.08477, over 19942.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01096, ecapa_loss=0.000165, whisper_loss=0.09079, over 3881397.95 frames. ], batch size: 77, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:08:04,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2024-08-13 12:08:08,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2158030.0, ans=0.125 2024-08-13 12:08:52,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=15.0 2024-08-13 12:09:00,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2158330.0, ans=0.0 2024-08-13 12:09:01,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.397e+01 2.771e+01 3.216e+01 4.644e+01, threshold=5.541e+01, percent-clipped=0.0 2024-08-13 12:09:05,069 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 12950, loss[loss=0.1098, beats_loss=0.009747, ecapa_loss=0.0001762, whisper_loss=0.09832, over 22325.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01091, ecapa_loss=0.0001651, whisper_loss=0.09065, over 3862196.72 frames. ], batch size: 90, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:09:14,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2158430.0, ans=0.125 2024-08-13 12:10:19,334 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13000, loss[loss=0.109, beats_loss=0.0103, ecapa_loss=0.0001484, whisper_loss=0.09719, over 13457.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01084, ecapa_loss=0.0001658, whisper_loss=0.09085, over 3886147.32 frames. ], batch size: 54, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:10:21,958 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 19 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-13 12:10:29,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2158930.0, ans=0.0 2024-08-13 12:10:33,632 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-13 12:10:38,908 INFO [train_multi_KD3.py:844] (0/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-13 12:10:45,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2159130.0, ans=0.1 2024-08-13 12:10:47,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-13 12:10:55,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2159130.0, ans=0.125 2024-08-13 12:10:59,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2159130.0, ans=0.125 2024-08-13 12:11:27,308 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-13 12:11:30,603 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+01 2.395e+01 2.755e+01 3.311e+01 7.767e+01, threshold=5.510e+01, percent-clipped=1.0 2024-08-13 12:11:33,549 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13050, loss[loss=0.109, beats_loss=0.01175, ecapa_loss=0.0001741, whisper_loss=0.09549, over 23550.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01082, ecapa_loss=0.0001651, whisper_loss=0.09097, over 3879689.15 frames. ], batch size: 94, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:11:35,514 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 31 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-13 12:11:38,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2159430.0, ans=0.2 2024-08-13 12:12:14,364 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 12:12:14,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2159630.0, ans=0.0 2024-08-13 12:12:24,510 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 12:12:26,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=12.0 2024-08-13 12:12:50,429 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13100, loss[loss=0.1144, beats_loss=0.008702, ecapa_loss=0.0001439, whisper_loss=0.1042, over 19640.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01083, ecapa_loss=0.0001642, whisper_loss=0.09116, over 3904977.96 frames. ], batch size: 71, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:12:52,061 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 12:13:00,927 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-216000.pt 2024-08-13 12:13:06,827 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-13 12:13:13,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-13 12:13:15,926 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-13 12:13:20,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-13 12:13:31,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-13 12:13:42,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2160230.0, ans=0.025 2024-08-13 12:13:51,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2160230.0, ans=0.125 2024-08-13 12:13:52,869 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-13 12:14:09,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+01 2.515e+01 2.737e+01 3.188e+01 6.948e+01, threshold=5.474e+01, percent-clipped=1.0 2024-08-13 12:14:11,529 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-13 12:14:12,774 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13150, loss[loss=0.1044, beats_loss=0.01135, ecapa_loss=0.0001535, whisper_loss=0.0915, over 22326.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01088, ecapa_loss=0.0001632, whisper_loss=0.09094, over 3904483.44 frames. ], batch size: 88, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:14:46,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2160630.0, ans=0.125 2024-08-13 12:14:54,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2160630.0, ans=0.2 2024-08-13 12:14:56,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2160630.0, ans=0.1 2024-08-13 12:15:00,871 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 12:15:01,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2160730.0, ans=0.2 2024-08-13 12:15:11,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2160730.0, ans=0.125 2024-08-13 12:15:15,062 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 28 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 12:15:15,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2160830.0, ans=0.1 2024-08-13 12:15:32,581 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13200, loss[loss=0.1045, beats_loss=0.01042, ecapa_loss=0.0001804, whisper_loss=0.09228, over 16194.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01086, ecapa_loss=0.0001625, whisper_loss=0.0913, over 3897140.07 frames. ], batch size: 66, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:15:39,320 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 26 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-13 12:15:41,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2160930.0, ans=0.05 2024-08-13 12:15:43,873 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 26 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-13 12:15:53,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2161030.0, ans=0.125 2024-08-13 12:16:08,038 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-13 12:16:11,303 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 12:16:11,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2161130.0, ans=0.125 2024-08-13 12:16:51,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.301e+01 2.587e+01 2.900e+01 9.399e+01, threshold=5.174e+01, percent-clipped=1.0 2024-08-13 12:16:54,504 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13250, loss[loss=0.08281, beats_loss=0.01156, ecapa_loss=0.0001912, whisper_loss=0.06933, over 22212.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01076, ecapa_loss=0.0001641, whisper_loss=0.09177, over 3891310.43 frames. ], batch size: 94, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:16:54,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2161430.0, ans=0.125 2024-08-13 12:17:05,560 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-13 12:17:07,801 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 12:17:30,504 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-13 12:17:33,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2161630.0, ans=0.0 2024-08-13 12:17:50,439 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 12:18:12,228 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13300, loss[loss=0.1132, beats_loss=0.007937, ecapa_loss=0.0002045, whisper_loss=0.1033, over 13505.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001641, whisper_loss=0.09201, over 3881893.72 frames. ], batch size: 55, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:18:36,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2162030.0, ans=0.04949747468305833 2024-08-13 12:18:50,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.70 vs. limit=22.5 2024-08-13 12:19:11,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2162230.0, ans=0.1 2024-08-13 12:19:18,953 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-13 12:19:21,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2162330.0, ans=0.0 2024-08-13 12:19:25,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=22.5 2024-08-13 12:19:29,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.345e+01 2.598e+01 2.972e+01 4.210e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-13 12:19:33,258 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13350, loss[loss=0.08758, beats_loss=0.01287, ecapa_loss=0.0001311, whisper_loss=0.0734, over 15792.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01085, ecapa_loss=0.0001629, whisper_loss=0.09142, over 3873807.84 frames. ], batch size: 61, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:19:36,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2162430.0, ans=0.125 2024-08-13 12:19:49,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2162530.0, ans=0.0 2024-08-13 12:20:03,224 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 19 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-13 12:20:15,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2162630.0, ans=0.1 2024-08-13 12:20:20,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2162730.0, ans=0.2 2024-08-13 12:20:22,073 INFO [train_multi_KD3.py:844] (0/4) A total of 78 cuts. 28 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-13 12:20:26,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2162730.0, ans=0.125 2024-08-13 12:20:28,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2162730.0, ans=0.0 2024-08-13 12:20:46,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2162830.0, ans=0.125 2024-08-13 12:20:50,003 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13400, loss[loss=0.107, beats_loss=0.008711, ecapa_loss=0.0001714, whisper_loss=0.09655, over 15265.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.0001619, whisper_loss=0.09091, over 3845219.23 frames. ], batch size: 59, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:21:00,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2162930.0, ans=0.2 2024-08-13 12:21:08,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2163030.0, ans=0.125 2024-08-13 12:21:16,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2163030.0, ans=0.125 2024-08-13 12:21:25,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2163130.0, ans=0.2 2024-08-13 12:21:48,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2024-08-13 12:22:06,270 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.423e+01 2.749e+01 3.162e+01 4.773e+01, threshold=5.498e+01, percent-clipped=0.0 2024-08-13 12:22:08,848 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13450, loss[loss=0.09342, beats_loss=0.01193, ecapa_loss=0.0001711, whisper_loss=0.07978, over 16266.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01093, ecapa_loss=0.0001618, whisper_loss=0.09075, over 3870861.73 frames. ], batch size: 66, lr: 4.13e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:22:17,055 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 15 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-13 12:22:17,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-13 12:22:19,973 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-13 12:22:25,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2163530.0, ans=0.0 2024-08-13 12:22:35,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2163530.0, ans=0.125 2024-08-13 12:23:03,688 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:23:03,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2163730.0, ans=0.125 2024-08-13 12:23:15,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2163830.0, ans=0.2 2024-08-13 12:23:24,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2163830.0, ans=0.125 2024-08-13 12:23:26,664 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13500, loss[loss=0.09983, beats_loss=0.01097, ecapa_loss=0.0001948, whisper_loss=0.08691, over 18171.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01088, ecapa_loss=0.0001632, whisper_loss=0.0907, over 3846626.59 frames. ], batch size: 73, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:23:29,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-08-13 12:24:01,102 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 12:24:04,078 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 10 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-13 12:24:07,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2164130.0, ans=0.1 2024-08-13 12:24:16,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2164230.0, ans=0.125 2024-08-13 12:24:31,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2164330.0, ans=0.125 2024-08-13 12:24:40,255 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-13 12:24:41,395 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.326e+01 2.605e+01 3.115e+01 6.571e+01, threshold=5.210e+01, percent-clipped=1.0 2024-08-13 12:24:45,034 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13550, loss[loss=0.1078, beats_loss=0.009496, ecapa_loss=0.0001709, whisper_loss=0.09661, over 18085.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0109, ecapa_loss=0.0001626, whisper_loss=0.0904, over 3855903.27 frames. ], batch size: 71, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:24:45,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-13 12:24:45,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2024-08-13 12:24:51,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-13 12:24:54,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2164430.0, ans=0.0 2024-08-13 12:25:25,676 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-13 12:25:31,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-13 12:25:33,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2164730.0, ans=0.0 2024-08-13 12:25:49,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2164830.0, ans=0.0 2024-08-13 12:26:02,023 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13600, loss[loss=0.09616, beats_loss=0.007933, ecapa_loss=0.0002173, whisper_loss=0.08605, over 20471.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01087, ecapa_loss=0.0001647, whisper_loss=0.09034, over 3845036.91 frames. ], batch size: 84, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:26:16,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2165030.0, ans=0.0 2024-08-13 12:26:24,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2165030.0, ans=0.125 2024-08-13 12:26:31,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2165030.0, ans=0.2 2024-08-13 12:26:50,091 INFO [train_multi_KD3.py:844] (0/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-13 12:27:17,528 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.537e+01 2.794e+01 3.122e+01 4.623e+01, threshold=5.587e+01, percent-clipped=0.0 2024-08-13 12:27:20,453 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13650, loss[loss=0.1121, beats_loss=0.009724, ecapa_loss=0.0001617, whisper_loss=0.1007, over 21581.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01085, ecapa_loss=0.000165, whisper_loss=0.09092, over 3862499.37 frames. ], batch size: 84, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:27:20,691 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-13 12:27:33,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-08-13 12:27:43,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2165530.0, ans=0.95 2024-08-13 12:27:54,576 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.751e+01 2024-08-13 12:27:59,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2165630.0, ans=0.0 2024-08-13 12:28:03,758 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-13 12:28:24,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2165830.0, ans=0.0 2024-08-13 12:28:25,562 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-13 12:28:30,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2165830.0, ans=0.125 2024-08-13 12:28:35,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2165830.0, ans=0.125 2024-08-13 12:28:36,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2165930.0, ans=0.125 2024-08-13 12:28:38,037 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13700, loss[loss=0.1239, beats_loss=0.009624, ecapa_loss=0.0001727, whisper_loss=0.1125, over 23320.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01087, ecapa_loss=0.000165, whisper_loss=0.09144, over 3869247.21 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:28:55,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2166030.0, ans=15.0 2024-08-13 12:29:07,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2166130.0, ans=0.125 2024-08-13 12:29:26,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2166230.0, ans=0.1 2024-08-13 12:29:28,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2166230.0, ans=0.0 2024-08-13 12:29:44,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2166330.0, ans=0.125 2024-08-13 12:29:44,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2166330.0, ans=0.0 2024-08-13 12:29:52,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.509e+01 2.844e+01 3.319e+01 7.223e+01, threshold=5.689e+01, percent-clipped=1.0 2024-08-13 12:29:55,318 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13750, loss[loss=0.07624, beats_loss=0.01194, ecapa_loss=0.000166, whisper_loss=0.06264, over 22265.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01085, ecapa_loss=0.0001653, whisper_loss=0.09117, over 3903392.39 frames. ], batch size: 94, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:30:16,383 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-13 12:30:18,655 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-13 12:30:23,335 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-13 12:30:23,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2166530.0, ans=0.125 2024-08-13 12:30:31,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2166630.0, ans=0.1 2024-08-13 12:30:37,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2166630.0, ans=0.0 2024-08-13 12:30:44,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2166730.0, ans=0.125 2024-08-13 12:30:44,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2166730.0, ans=0.125 2024-08-13 12:30:47,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2166730.0, ans=0.0 2024-08-13 12:31:12,661 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13800, loss[loss=0.1174, beats_loss=0.01005, ecapa_loss=0.0001781, whisper_loss=0.1056, over 22337.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01075, ecapa_loss=0.0001646, whisper_loss=0.09218, over 3888914.49 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:31:17,216 INFO [train_multi_KD3.py:844] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-13 12:31:17,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2166930.0, ans=0.1 2024-08-13 12:31:36,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2167030.0, ans=15.0 2024-08-13 12:31:51,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2167130.0, ans=0.05 2024-08-13 12:32:05,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2024-08-13 12:32:08,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2167230.0, ans=0.2 2024-08-13 12:32:26,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.03 vs. limit=22.5 2024-08-13 12:32:26,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.343e+01 2.633e+01 2.825e+01 4.077e+01, threshold=5.266e+01, percent-clipped=0.0 2024-08-13 12:32:30,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13850, loss[loss=0.07447, beats_loss=0.01565, ecapa_loss=0.0001429, whisper_loss=0.0574, over 21717.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01082, ecapa_loss=0.0001636, whisper_loss=0.09182, over 3894995.25 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:32:35,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.09 vs. limit=6.0 2024-08-13 12:32:57,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2167530.0, ans=15.0 2024-08-13 12:33:33,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-08-13 12:33:38,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2167830.0, ans=0.2 2024-08-13 12:33:43,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2167830.0, ans=0.125 2024-08-13 12:33:47,692 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13900, loss[loss=0.1062, beats_loss=0.01012, ecapa_loss=0.0001683, whisper_loss=0.09441, over 21796.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.0108, ecapa_loss=0.000164, whisper_loss=0.09214, over 3895514.43 frames. ], batch size: 88, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:33:54,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2167930.0, ans=0.125 2024-08-13 12:33:57,019 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-13 12:33:57,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2167930.0, ans=0.1 2024-08-13 12:34:03,336 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 14 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-13 12:34:09,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2168030.0, ans=0.125 2024-08-13 12:34:20,286 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 16 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-13 12:34:24,370 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 20 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-13 12:34:46,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=15.0 2024-08-13 12:34:49,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2168330.0, ans=0.125 2024-08-13 12:34:52,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=12.0 2024-08-13 12:34:57,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2024-08-13 12:35:02,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.480e+01 2.802e+01 3.173e+01 5.254e+01, threshold=5.604e+01, percent-clipped=0.0 2024-08-13 12:35:04,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2168430.0, ans=0.125 2024-08-13 12:35:05,019 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 13950, loss[loss=0.08848, beats_loss=0.01198, ecapa_loss=0.000157, whisper_loss=0.07493, over 19002.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01084, ecapa_loss=0.0001639, whisper_loss=0.09184, over 3855386.70 frames. ], batch size: 78, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:35:06,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2168430.0, ans=0.125 2024-08-13 12:35:08,385 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-13 12:35:10,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2168430.0, ans=0.2 2024-08-13 12:35:25,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2168530.0, ans=0.2 2024-08-13 12:35:33,412 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-13 12:35:34,930 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 34 from Vox, 29 fro AS 2024-08-13 12:35:35,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2168630.0, ans=0.0 2024-08-13 12:35:41,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2168630.0, ans=0.125 2024-08-13 12:35:50,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2168730.0, ans=0.0 2024-08-13 12:36:04,258 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 25 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-13 12:36:13,823 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-13 12:36:20,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2168830.0, ans=0.125 2024-08-13 12:36:31,140 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14000, loss[loss=0.1017, beats_loss=0.01274, ecapa_loss=0.000164, whisper_loss=0.08728, over 22206.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01089, ecapa_loss=0.0001632, whisper_loss=0.09146, over 3904567.22 frames. ], batch size: 89, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:37:10,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2169130.0, ans=0.025 2024-08-13 12:37:12,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2169130.0, ans=0.125 2024-08-13 12:37:46,053 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 12:37:55,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2169330.0, ans=0.1 2024-08-13 12:37:56,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.610e+01 2.869e+01 3.326e+01 4.545e+01, threshold=5.739e+01, percent-clipped=0.0 2024-08-13 12:38:02,086 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14050, loss[loss=0.1113, beats_loss=0.01155, ecapa_loss=0.0001584, whisper_loss=0.09821, over 15383.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01099, ecapa_loss=0.0001619, whisper_loss=0.09109, over 3903331.92 frames. ], batch size: 60, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:38:05,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2169430.0, ans=0.2 2024-08-13 12:38:15,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2169430.0, ans=0.125 2024-08-13 12:38:22,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2169430.0, ans=0.125 2024-08-13 12:38:34,242 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-13 12:39:01,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2169630.0, ans=0.125 2024-08-13 12:39:42,584 INFO [train_multi_KD3.py:844] (0/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 12:39:48,338 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14100, loss[loss=0.1025, beats_loss=0.01154, ecapa_loss=0.0001444, whisper_loss=0.08956, over 18301.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01096, ecapa_loss=0.0001618, whisper_loss=0.09119, over 3879101.28 frames. ], batch size: 71, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:40:07,558 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 12:40:12,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2170030.0, ans=0.1 2024-08-13 12:40:39,724 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 12:40:44,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=15.0 2024-08-13 12:40:47,649 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-13 12:41:23,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2170330.0, ans=0.1 2024-08-13 12:41:31,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-13 12:41:39,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.420e+01 2.685e+01 3.019e+01 4.436e+01, threshold=5.370e+01, percent-clipped=0.0 2024-08-13 12:41:45,784 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14150, loss[loss=0.1158, beats_loss=0.008578, ecapa_loss=0.0001722, whisper_loss=0.1055, over 13851.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01099, ecapa_loss=0.0001609, whisper_loss=0.0912, over 3868547.43 frames. ], batch size: 55, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:42:24,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2170530.0, ans=0.05 2024-08-13 12:42:26,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2170530.0, ans=0.125 2024-08-13 12:43:13,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2170730.0, ans=0.0 2024-08-13 12:43:26,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2170830.0, ans=0.125 2024-08-13 12:43:38,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2170830.0, ans=0.0 2024-08-13 12:43:46,006 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14200, loss[loss=0.1183, beats_loss=0.0103, ecapa_loss=0.0001942, whisper_loss=0.106, over 22825.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01099, ecapa_loss=0.0001605, whisper_loss=0.09136, over 3898921.12 frames. ], batch size: 93, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:44:32,803 INFO [train_multi_KD3.py:844] (0/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-13 12:44:49,672 INFO [train_multi_KD3.py:844] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 12:44:55,294 INFO [train_multi_KD3.py:844] (0/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-13 12:45:03,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2171230.0, ans=0.0 2024-08-13 12:45:08,896 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-13 12:45:34,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2171330.0, ans=0.1 2024-08-13 12:45:43,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.405e+01 2.774e+01 3.077e+01 4.390e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-13 12:45:49,102 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14250, loss[loss=0.09654, beats_loss=0.01014, ecapa_loss=0.0001546, whisper_loss=0.08486, over 15064.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01088, ecapa_loss=0.0001602, whisper_loss=0.09196, over 3877434.05 frames. ], batch size: 58, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:45:49,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2171430.0, ans=0.1 2024-08-13 12:45:50,863 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 36 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-13 12:45:51,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2171430.0, ans=0.125 2024-08-13 12:45:56,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2171430.0, ans=0.125 2024-08-13 12:46:01,000 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-13 12:46:22,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2171530.0, ans=0.125 2024-08-13 12:46:25,139 INFO [train_multi_KD3.py:844] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-13 12:46:25,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2171630.0, ans=0.035 2024-08-13 12:46:51,535 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 19 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-13 12:46:57,570 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-13 12:47:05,439 INFO [train_multi_KD3.py:844] (0/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-13 12:47:13,357 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14300, loss[loss=0.08687, beats_loss=0.01186, ecapa_loss=0.0001729, whisper_loss=0.07328, over 21444.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01088, ecapa_loss=0.0001596, whisper_loss=0.09158, over 3899026.16 frames. ], batch size: 88, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:47:27,523 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-13 12:47:40,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2172030.0, ans=0.0 2024-08-13 12:47:48,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2172130.0, ans=0.0 2024-08-13 12:47:49,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-13 12:47:58,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2172230.0, ans=0.125 2024-08-13 12:48:01,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2172230.0, ans=0.125 2024-08-13 12:48:03,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2024-08-13 12:48:28,797 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-13 12:48:30,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.493e+01 2.701e+01 3.105e+01 1.229e+02, threshold=5.402e+01, percent-clipped=5.0 2024-08-13 12:48:31,004 INFO [train_multi_KD3.py:844] (0/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-13 12:48:34,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14350, loss[loss=0.08189, beats_loss=0.01051, ecapa_loss=0.0001723, whisper_loss=0.06966, over 17228.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001609, whisper_loss=0.09122, over 3874595.83 frames. ], batch size: 71, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:48:36,417 INFO [train_multi_KD3.py:844] (0/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 12:48:37,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2024-08-13 12:48:53,856 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-13 12:49:13,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2172630.0, ans=0.0 2024-08-13 12:49:19,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2172630.0, ans=0.0 2024-08-13 12:49:23,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2172730.0, ans=0.0 2024-08-13 12:49:36,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2172830.0, ans=0.0 2024-08-13 12:49:36,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2172830.0, ans=0.125 2024-08-13 12:49:41,891 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 40 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-13 12:49:50,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=15.0 2024-08-13 12:49:54,122 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14400, loss[loss=0.1052, beats_loss=0.01086, ecapa_loss=0.0001802, whisper_loss=0.09253, over 16640.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.01087, ecapa_loss=0.0001605, whisper_loss=0.09185, over 3882576.69 frames. ], batch size: 69, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:50:00,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=12.0 2024-08-13 12:50:14,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2173030.0, ans=0.1 2024-08-13 12:50:35,224 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-13 12:50:38,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2173130.0, ans=0.1 2024-08-13 12:51:01,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2173330.0, ans=0.035 2024-08-13 12:51:02,844 INFO [train_multi_KD3.py:844] (0/4) A total of 81 cuts. 22 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-13 12:51:04,356 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 31 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-13 12:51:13,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.394e+01 2.605e+01 2.942e+01 4.760e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-13 12:51:17,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 15, batch 14450, loss[loss=0.1162, beats_loss=0.008565, ecapa_loss=0.0002046, whisper_loss=0.1056, over 21988.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01081, ecapa_loss=0.0001619, whisper_loss=0.09219, over 3913454.76 frames. ], batch size: 91, lr: 4.12e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:51:20,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2173430.0, ans=0.0 2024-08-13 12:51:24,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-08-13 12:51:26,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.41 vs. limit=10.0 2024-08-13 12:51:31,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2173530.0, ans=0.125 2024-08-13 12:51:42,458 INFO [train_multi_KD3.py:844] (0/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-13 12:51:51,293 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-13 12:51:59,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2173630.0, ans=0.1 2024-08-13 12:52:05,668 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.453e+01 2024-08-13 12:52:10,585 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-13 12:52:15,439 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-15.pt 2024-08-13 12:52:46,854 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-13 12:52:47,909 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 0, loss[loss=0.1045, beats_loss=0.00927, ecapa_loss=0.0001665, whisper_loss=0.09354, over 22408.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.00927, ecapa_loss=0.0001665, whisper_loss=0.09354, over 22408.00 frames. ], batch size: 92, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:52:47,910 INFO [train_multi_KD3.py:1139] (0/4) Computing validation loss 2024-08-13 12:53:29,349 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005644, whisper_loss=0.2485, over 922467.00 frames. 2024-08-13 12:53:45,305 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on SV_voxceleb1: loss=0.00454, beats_loss=0, ecapa_loss=0.000454, whisper_loss=0, over 939242.00 frames. 2024-08-13 12:55:41,359 INFO [train_multi_KD3.py:1149] (0/4) Epoch 16, validation on AT_audioset: loss=0.02377, beats_loss=0.02377, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-13 12:55:41,363 INFO [train_multi_KD3.py:1155] (0/4) Maximum memory allocated so far is 32381MB 2024-08-13 12:57:07,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2174110.0, ans=0.0 2024-08-13 12:57:23,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2174210.0, ans=0.2 2024-08-13 12:57:25,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2174210.0, ans=0.07 2024-08-13 12:57:30,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=42.69 vs. limit=22.5 2024-08-13 12:57:47,632 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 50, loss[loss=0.1233, beats_loss=0.009859, ecapa_loss=0.0001592, whisper_loss=0.1118, over 23102.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.009676, ecapa_loss=0.0001682, whisper_loss=0.09128, over 902191.82 frames. ], batch size: 88, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:58:11,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.624e+01 2.946e+01 3.270e+01 5.312e+01, threshold=5.891e+01, percent-clipped=1.0 2024-08-13 12:58:41,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2174510.0, ans=0.0 2024-08-13 12:58:57,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2174610.0, ans=0.125 2024-08-13 12:58:58,863 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-13 12:59:34,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2174710.0, ans=0.1 2024-08-13 12:59:43,492 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 100, loss[loss=0.107, beats_loss=0.01103, ecapa_loss=0.0001228, whisper_loss=0.09476, over 17527.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01014, ecapa_loss=0.0001631, whisper_loss=0.08922, over 1583138.14 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 12:59:59,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2174810.0, ans=0.125 2024-08-13 13:00:08,764 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 30 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-13 13:00:19,738 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-13 13:00:22,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2174910.0, ans=0.1 2024-08-13 13:00:25,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-13 13:00:31,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2175010.0, ans=0.0 2024-08-13 13:00:33,599 INFO [train_multi_KD3.py:844] (0/4) A total of 96 cuts. 28 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-13 13:00:36,111 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 31 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-13 13:01:01,264 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 22 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-13 13:01:02,900 INFO [train_multi_KD3.py:844] (0/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-13 13:01:08,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2175110.0, ans=0.0 2024-08-13 13:01:27,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2175210.0, ans=0.2 2024-08-13 13:01:27,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2175210.0, ans=0.1 2024-08-13 13:01:34,187 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 150, loss[loss=0.09881, beats_loss=0.01063, ecapa_loss=0.0001754, whisper_loss=0.08642, over 22691.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01013, ecapa_loss=0.0001636, whisper_loss=0.09042, over 2096617.56 frames. ], batch size: 97, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:01:38,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2024-08-13 13:01:39,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2175310.0, ans=0.2 2024-08-13 13:01:52,173 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.691e+01 2.914e+01 3.205e+01 4.939e+01, threshold=5.827e+01, percent-clipped=0.0 2024-08-13 13:02:04,591 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.675e-03 2024-08-13 13:02:06,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-08-13 13:02:25,637 INFO [train_multi_KD3.py:844] (0/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-13 13:02:38,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2175610.0, ans=0.025 2024-08-13 13:02:40,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2175710.0, ans=0.0 2024-08-13 13:02:40,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2175710.0, ans=0.0 2024-08-13 13:02:44,764 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 15 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-13 13:02:53,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2175710.0, ans=0.125 2024-08-13 13:02:56,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2175810.0, ans=0.1 2024-08-13 13:02:57,004 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 200, loss[loss=0.1194, beats_loss=0.009178, ecapa_loss=0.000151, whisper_loss=0.1087, over 22668.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01014, ecapa_loss=0.0001634, whisper_loss=0.09147, over 2485033.81 frames. ], batch size: 87, lr: 3.98e-03, grad_scale: 5.764607523034235e+17 2024-08-13 13:03:04,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2175810.0, ans=0.0 2024-08-13 13:03:37,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-13 13:03:45,573 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 13:03:50,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=12.0 2024-08-13 13:03:59,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2176210.0, ans=0.125 2024-08-13 13:04:12,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2176310.0, ans=0.125 2024-08-13 13:04:13,407 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 250, loss[loss=0.1187, beats_loss=0.009916, ecapa_loss=0.0001437, whisper_loss=0.1074, over 19366.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01025, ecapa_loss=0.0001628, whisper_loss=0.09201, over 2776898.48 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:04:27,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.510e+01 2.289e+01 2.601e+01 2.843e+01 4.467e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-13 13:04:35,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=2176410.0, ans=0.1 2024-08-13 13:04:43,882 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-13 13:04:45,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2176510.0, ans=0.125 2024-08-13 13:04:45,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2176510.0, ans=0.125 2024-08-13 13:04:51,575 INFO [train_multi_KD3.py:844] (0/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-13 13:04:58,357 INFO [train_multi_KD3.py:844] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 13:05:02,915 INFO [train_multi_KD3.py:844] (0/4) A total of 94 cuts. 25 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-13 13:05:03,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-08-13 13:05:04,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2176610.0, ans=0.125 2024-08-13 13:05:07,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2024-08-13 13:05:25,184 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 300, loss[loss=0.09772, beats_loss=0.01042, ecapa_loss=0.000174, whisper_loss=0.08556, over 22301.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01033, ecapa_loss=0.0001632, whisper_loss=0.09132, over 3011003.15 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:05:29,833 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-13 13:05:31,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2176810.0, ans=0.1 2024-08-13 13:05:57,194 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 31 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-13 13:06:01,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2177010.0, ans=0.125 2024-08-13 13:06:03,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2177010.0, ans=0.125 2024-08-13 13:06:06,389 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-13 13:06:12,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2024-08-13 13:06:20,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2177110.0, ans=0.125 2024-08-13 13:06:32,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2177210.0, ans=0.0 2024-08-13 13:06:34,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-08-13 13:06:36,788 INFO [train_multi_KD3.py:844] (0/4) A total of 71 cuts. 21 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 13:06:38,063 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 350, loss[loss=0.09271, beats_loss=0.01152, ecapa_loss=0.0001757, whisper_loss=0.07943, over 17334.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001602, whisper_loss=0.09034, over 3208707.28 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:06:44,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2177310.0, ans=0.125 2024-08-13 13:06:52,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.376e+01 2.584e+01 2.917e+01 1.097e+02, threshold=5.167e+01, percent-clipped=1.0 2024-08-13 13:07:07,713 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-13 13:07:25,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2177610.0, ans=0.2 2024-08-13 13:07:42,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2177710.0, ans=0.0 2024-08-13 13:07:45,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2177710.0, ans=0.125 2024-08-13 13:07:49,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2177710.0, ans=0.125 2024-08-13 13:07:51,030 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 400, loss[loss=0.08896, beats_loss=0.01189, ecapa_loss=0.0001668, whisper_loss=0.07541, over 18066.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001601, whisper_loss=0.08954, over 3354116.98 frames. ], batch size: 76, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:07:51,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2177810.0, ans=0.0 2024-08-13 13:07:53,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.87 vs. limit=15.0 2024-08-13 13:08:01,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2177810.0, ans=0.125 2024-08-13 13:08:09,666 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 21 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-13 13:08:26,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2178010.0, ans=0.125 2024-08-13 13:08:29,530 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 32 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-13 13:08:29,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-13 13:08:51,520 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-13 13:08:51,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2178210.0, ans=0.1 2024-08-13 13:08:55,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-13 13:09:02,107 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 450, loss[loss=0.1141, beats_loss=0.00898, ecapa_loss=0.0001565, whisper_loss=0.1036, over 18592.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001609, whisper_loss=0.09059, over 3462100.05 frames. ], batch size: 68, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:09:16,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.363e+01 2.643e+01 2.945e+01 6.968e+01, threshold=5.285e+01, percent-clipped=1.0 2024-08-13 13:09:41,797 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-13 13:09:42,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2178510.0, ans=0.1 2024-08-13 13:09:47,506 INFO [train_multi_KD3.py:844] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-13 13:09:57,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2178710.0, ans=0.2 2024-08-13 13:10:02,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2178710.0, ans=0.125 2024-08-13 13:10:10,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2178710.0, ans=0.1 2024-08-13 13:10:14,137 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 500, loss[loss=0.1115, beats_loss=0.009245, ecapa_loss=0.0001314, whisper_loss=0.1009, over 19707.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001605, whisper_loss=0.09009, over 3534361.21 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:10:17,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2178810.0, ans=0.0 2024-08-13 13:10:27,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=15.0 2024-08-13 13:10:27,636 INFO [train_multi_KD3.py:844] (0/4) A total of 73 cuts. 24 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-13 13:10:27,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2178910.0, ans=0.1 2024-08-13 13:10:46,936 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-13 13:10:53,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-13 13:11:00,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2179110.0, ans=0.125 2024-08-13 13:11:02,027 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 26 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-13 13:11:22,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2179210.0, ans=0.125 2024-08-13 13:11:22,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2179210.0, ans=0.2 2024-08-13 13:11:28,320 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 550, loss[loss=0.112, beats_loss=0.009714, ecapa_loss=0.000183, whisper_loss=0.1004, over 14709.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001607, whisper_loss=0.09008, over 3583552.13 frames. ], batch size: 60, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:11:35,018 INFO [train_multi_KD3.py:844] (0/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-13 13:11:39,238 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 22 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-13 13:11:43,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.371e+01 2.596e+01 2.960e+01 4.995e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-13 13:11:45,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2179410.0, ans=0.125 2024-08-13 13:12:06,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2179510.0, ans=0.125 2024-08-13 13:12:06,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-13 13:12:08,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-13 13:12:16,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2179610.0, ans=0.2 2024-08-13 13:12:25,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2179710.0, ans=0.2 2024-08-13 13:12:30,388 INFO [train_multi_KD3.py:844] (0/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-13 13:12:40,844 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 600, loss[loss=0.107, beats_loss=0.01124, ecapa_loss=0.0001633, whisper_loss=0.0941, over 18325.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01074, ecapa_loss=0.0001601, whisper_loss=0.09011, over 3665373.79 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:12:42,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2179810.0, ans=0.0 2024-08-13 13:12:43,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2179810.0, ans=0.07 2024-08-13 13:12:59,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2179910.0, ans=0.2 2024-08-13 13:13:32,775 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-13 13:13:42,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2180210.0, ans=0.0 2024-08-13 13:13:48,176 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-13 13:13:53,478 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 650, loss[loss=0.089, beats_loss=0.01367, ecapa_loss=0.0001786, whisper_loss=0.07354, over 20793.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001612, whisper_loss=0.09038, over 3669429.64 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:14:07,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2180410.0, ans=15.0 2024-08-13 13:14:08,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.428e+01 2.791e+01 3.201e+01 6.340e+01, threshold=5.582e+01, percent-clipped=1.0 2024-08-13 13:14:08,441 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-13 13:14:09,750 INFO [train_multi_KD3.py:844] (0/4) A total of 61 cuts. 14 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-13 13:14:23,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2180510.0, ans=0.125 2024-08-13 13:14:35,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2180510.0, ans=0.0 2024-08-13 13:14:54,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2180710.0, ans=0.125 2024-08-13 13:14:59,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=15.0 2024-08-13 13:15:00,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2180710.0, ans=0.125 2024-08-13 13:15:00,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2180710.0, ans=0.0 2024-08-13 13:15:03,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2180710.0, ans=0.125 2024-08-13 13:15:04,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2180710.0, ans=0.1 2024-08-13 13:15:06,852 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 700, loss[loss=0.0768, beats_loss=0.01036, ecapa_loss=0.0001671, whisper_loss=0.06476, over 16569.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001618, whisper_loss=0.09029, over 3686598.42 frames. ], batch size: 67, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:15:08,791 INFO [train_multi_KD3.py:844] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-13 13:15:30,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=15.0 2024-08-13 13:15:33,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2180910.0, ans=0.125 2024-08-13 13:15:35,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.81 vs. limit=22.5 2024-08-13 13:15:36,176 INFO [train_multi_KD3.py:844] (0/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-13 13:16:08,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2181210.0, ans=0.0 2024-08-13 13:16:09,754 INFO [train_multi_KD3.py:844] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-13 13:16:22,125 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 750, loss[loss=0.09619, beats_loss=0.0118, ecapa_loss=0.0001712, whisper_loss=0.08269, over 17834.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001606, whisper_loss=0.09039, over 3739754.93 frames. ], batch size: 71, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:16:24,838 INFO [train_multi_KD3.py:844] (0/4) A total of 53 cuts. 12 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-13 13:16:30,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2181310.0, ans=0.125 2024-08-13 13:16:37,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.319e+01 2.745e+01 2.985e+01 9.286e+01, threshold=5.489e+01, percent-clipped=1.0 2024-08-13 13:16:37,930 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-13 13:16:40,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2181410.0, ans=0.125 2024-08-13 13:16:52,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2181510.0, ans=0.125 2024-08-13 13:16:58,839 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-13 13:16:58,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2181510.0, ans=0.0 2024-08-13 13:17:04,720 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-13 13:17:08,981 INFO [train_multi_KD3.py:844] (0/4) A total of 86 cuts. 29 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 13:17:37,915 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 800, loss[loss=0.09495, beats_loss=0.01035, ecapa_loss=0.0001724, whisper_loss=0.08287, over 19424.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01068, ecapa_loss=0.0001603, whisper_loss=0.09059, over 3779489.96 frames. ], batch size: 79, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:17:54,365 INFO [train_multi_KD3.py:844] (0/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-13 13:18:10,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2182010.0, ans=0.0 2024-08-13 13:18:16,566 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 15 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-13 13:18:24,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2182110.0, ans=0.125 2024-08-13 13:18:36,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2182210.0, ans=0.125 2024-08-13 13:18:38,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2182210.0, ans=0.125 2024-08-13 13:18:44,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2182210.0, ans=0.125 2024-08-13 13:18:52,704 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 850, loss[loss=0.1137, beats_loss=0.01038, ecapa_loss=0.0001258, whisper_loss=0.102, over 19043.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.0001605, whisper_loss=0.09054, over 3801159.02 frames. ], batch size: 69, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:19:08,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.299e+01 2.538e+01 2.916e+01 7.643e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-13 13:19:25,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2024-08-13 13:19:27,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2182510.0, ans=0.125 2024-08-13 13:20:07,945 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 900, loss[loss=0.1117, beats_loss=0.01146, ecapa_loss=0.0001374, whisper_loss=0.09883, over 24371.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01069, ecapa_loss=0.0001593, whisper_loss=0.08995, over 3820928.70 frames. ], batch size: 92, lr: 3.98e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:20:22,353 INFO [train_multi_KD3.py:844] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-13 13:20:24,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2182910.0, ans=0.125 2024-08-13 13:20:39,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-13 13:20:42,707 INFO [train_multi_KD3.py:844] (0/4) A total of 59 cuts. 18 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-13 13:20:59,102 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 28 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-13 13:20:59,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-13 13:21:14,220 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 13 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-13 13:21:23,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=8.0 2024-08-13 13:21:27,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2183210.0, ans=0.07 2024-08-13 13:21:35,561 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 950, loss[loss=0.09006, beats_loss=0.01208, ecapa_loss=0.0001352, whisper_loss=0.07662, over 16100.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001592, whisper_loss=0.08946, over 3795168.35 frames. ], batch size: 62, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:21:36,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2183310.0, ans=0.125 2024-08-13 13:21:53,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.345e+01 2.599e+01 2.801e+01 4.371e+01, threshold=5.198e+01, percent-clipped=0.0 2024-08-13 13:21:53,393 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-13 13:22:09,370 INFO [train_multi_KD3.py:844] (0/4) A total of 54 cuts. 12 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-13 13:22:13,405 INFO [train_multi_KD3.py:844] (0/4) A total of 76 cuts. 28 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-13 13:22:19,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2183510.0, ans=0.125 2024-08-13 13:22:23,537 INFO [train_multi_KD3.py:844] (0/4) A total of 57 cuts. 20 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-13 13:22:51,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2183710.0, ans=0.1 2024-08-13 13:23:14,099 INFO [train_multi_KD3.py:1116] (0/4) Epoch 16, batch 1000, loss[loss=0.09138, beats_loss=0.0104, ecapa_loss=0.0001838, whisper_loss=0.07914, over 19067.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01075, ecapa_loss=0.0001584, whisper_loss=0.08893, over 3801216.58 frames. ], batch size: 77, lr: 3.97e-03, grad_scale: 1.152921504606847e+18 2024-08-13 13:23:25,352 INFO [train_multi_KD3.py:844] (0/4) A total of 91 cuts. 34 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-13 13:23:27,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2183810.0, ans=0.125 2024-08-13 13:24:00,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2184010.0, ans=0.07 2024-08-13 13:24:01,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-13 13:24:07,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-13 13:24:18,656 INFO [train_multi_KD3.py:844] (0/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-13 13:24:21,529 INFO [train_multi_KD3.py:844] (0/4) A total of 65 cuts. 19 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-13 13:24:29,454 INFO [train_multi_KD3.py:844] (0/4) A total of 93 cuts. 25 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-13 13:24:55,408 INFO [train_multi_KD3.py:844] (0/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS